Excellent tutorial: https://github.com/hbctraining/DGE_workshop/tree/master/lessons
The package DESeq2 provides methods to test for differential expression by use of negative binomial generalized linear models; the estimates of dispersion and logarithmic fold changes incorporate data-driven prior distributions.
Vignette: https://www.bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html
library(RColorBrewer)
library(circlize)
library(pheatmap)
library(gplots) #heatmap.2()
library(ComplexHeatmap)
library(tidyverse)
library(vegan)
library(cowplot)
library(ggrepel)
library(devtools)
library(vsn)
library(dplyr)
library(plyr)
library(ggplot2)
library(knitr)
library(gridExtra)
library(data.table)
# first time installation:
# if (!requireNamespace("BiocManager", quietly = TRUE))
# install.packages("BiocManager")
# BiocManager::install("DESeq2")
# BiocManager::install("DEGreport")
# BiocManager::install("arrayQualityMetrics")
library(BiocManager)
library(DESeq2)
library(DEGreport)
library(arrayQualityMetrics)
library(locfit)
setwd("~/Documents/Rprojects/postdoc\ Rprojects/ODU_postdoc_Rprojects/Paper_Ojo_gene-expression/Siderastrea/symbiont/")
As input, the DESeq2 package expects count data as obtained, e.g., from RNA-seq in the form of a matrix of integer values. The value in the i-th row and the j-th column of the matrix tells how many reads can be assigned to gene i in sample j.
The values in the matrix should be un-normalized counts or estimated counts of sequencing reads (for single-end RNA-seq) or fragments (for paired-end RNA-seq).
The DESeq2 model internally corrects for library size, so transformed or normalized values such as counts scaled by library size should not be used as input.
symbiont <- read.csv("Sid_hybridref_final_counts_Symbiont_Cladocopium.csv", header=TRUE, row.names=1)
dim(symbiont)
## [1] 47262 56
## Some genes were not detected in any samples (genes with zero reads across all samples)
symbiont <- symbiont[rowSums(symbiont) > 0, ]
dim(symbiont)
## [1] 45327 56
NB! It is absolutely critical that the columns of the count matrix and the rows of the column data (information about samples) are in the same order. There should be the same number of conditions described as there are samples in your data file, and in the same order.
rownames of meta have to match the column names of counts file
meta <- read.csv("~/Documents/Rprojects/postdoc\ Rprojects/ODU_postdoc_Rprojects/Paper_Ojo_gene-expression/Siderastrea/Siderastrea_Year-2_metadata.csv", header=TRUE, row.names=1)
# Remove all samples that were transplanted to the Destination Ojo.Norte
# There is no control site for Norte Ojo
# For now, we are focused on samples transplanted to Laja Control or Laja Ojo sites
# This is consistent with analyses for the other 2 species in the experiment
# Also, group sample size is too small for Ojo.Norte destination
meta <- subset(meta, Destination_name != "Ojo.Norte")
meta$Destination_type <- revalue(meta$Destination_type, c("control" = "Control", "ojo" = "Ojo"))
meta$Origin_type <- revalue(meta$Origin_type, c("Control" = "Lagoon"))
meta <- meta %>% unite(group, Origin_type, Destination_type, sep = ".", remove = FALSE)
meta[sapply(meta, is.character)] <- lapply(meta[sapply(meta, is.character)],
as.factor)
# Colony_ID is equivalent to genotype
meta$Colony_ID <- as.factor(meta$Colony_ID)
levels(meta$group)
## [1] "Lagoon.Control" "Lagoon.Ojo" "Ojo.Control" "Ojo.Ojo"
## [5] "Reef.Control" "Reef.Ojo"
meta <- dplyr::select(meta, Colony_ID, group, Origin_name, Origin_type, Destination_name, Destination_type, pH_Destination)
## The do.call() function produces a data frame with one col per sample,
## transpose it to obtain one row per sample and one column per statistics.
stats.per.sample <- data.frame(t(do.call(cbind, lapply(symbiont, summary))))
stats.per.sample$libsum <- apply(symbiont, 2, sum) ## libsum
stats.per.sample$zeros <- apply(symbiont==0, 2, sum)
stats.per.sample$percent.zeros <- 100*stats.per.sample$zeros/nrow(symbiont)
head(stats.per.sample)
## Min. X1st.Qu. Median Mean X3rd.Qu. Max. libsum zeros
## PC_349_C_Sid_yr2 0 0 0 1.6596289 0 34233 75226 40637
## PC_350_La_Sid_yr2 0 0 0 0.7361617 0 11906 33368 40095
## PC_361_N_Sid_yr2 0 0 0 4.7625256 0 71652 215871 43079
## PC_362_C_Sid_yr2 0 0 0 2.4103514 0 59480 109254 42306
## PC_363_La_Sid_yr2 0 0 0 2.1870188 0 60647 99131 43258
## PC_364_N_Sid_yr2 0 0 0 3.8387054 0 89134 173997 41213
## percent.zeros
## PC_349_C_Sid_yr2 89.65297
## PC_350_La_Sid_yr2 88.45721
## PC_361_N_Sid_yr2 95.04048
## PC_362_C_Sid_yr2 93.33510
## PC_363_La_Sid_yr2 95.43539
## PC_364_N_Sid_yr2 90.92373
par(mfrow=c(3,1))
hist(as.matrix(symbiont), col="blue", border="white", breaks=50)
hist(as.matrix(symbiont), col="blue", border="white",
breaks=20000, xlim=c(0,500), main="Counts per gene",
xlab="Counts (truncated axis)", ylab="Number of genes",
las=1, cex.axis=0.7)
epsilon <- 1 # pseudo-count to avoid problems with log(0)
hist(as.matrix(log2(symbiont + epsilon)), breaks=100, col="blue", border="white",
main="Log2-transformed counts per gene", xlab="log2(counts+1)", ylab="Number of genes",
las=1, cex.axis=0.7)
boxplot(log2(symbiont + epsilon), col=meta$group, pch=".",
horizontal=TRUE, cex.axis=0.5, las=1,
xlab="log2(Counts +1)")
if(!require("affy")){
source("http://bioconductor.org/biocLite.R")
biocLite("affy")
}
library(affy)
## Each curve corresponds to one sample
plotDensity(log2(symbiont + epsilon), lty=1, col= meta$group, lwd=2)
grid()
NB! the R function plotDensity() does not display the actual distribution of your values, but a polynomial fit. The representation thus generally looks smoother than the actual data. It is important to realize that, in some particular cases, the fit can lead to extrapolated values which can be misleading.
The filtering of low-expression genes is a common practice in the analysis of RNA-seq data. There are several reasons for this. For the detection of differentially expressed genes (DEGs) and from a biological point of view, genes that not expressed at a biologically meaningful level in any condition are not of interest and are therefore best ignored, but also because it can increase the number of total DEGs after correction of multiple testing, improving sensitivity and the precision of DEGs after filtering (Bourgon et al., 2010).
In addition, genes/transcripts having a low read count are generally considered as artifacts or ‘noise’. From a statistical point of view, removing low count genes allows the mean-variance relationship in the data to be estimated with greater reliability (Law et al., 2016).
https://seqqc.wordpress.com/2020/02/17/removing-low-count-genes-for-rna-seq-downstream-analysis/
# how many genes have mean count >=3
symbiont_means_filtered <- cbind(symbiont, means = apply(symbiont, 1, mean))
table(symbiont_means_filtered$means>=3)
##
## FALSE TRUE
## 34505 10822
symbiont_means_filtered <- subset(symbiont_means_filtered, symbiont_means_filtered$means>=3)
symbiont_means_filtered <- symbiont_means_filtered[, 1:56]
dim(symbiont_means_filtered)
## [1] 10822 56
drop <- c("PC_361_N_Sid_yr2", "PC_364_N_Sid_yr2", "PC_370_N_Sid_yr2", "PC_376_N_Sid_yr2", "PC_373_N_Sid_yr2", "PO_333_N_Sid_yr2", "PO_336_N_Sid_yr2", "PO_339_N_Sid_yr2", "PO_342_N_Sid_yr2", "PO_345_N_Sid_yr2", "PO_348_N_Sid_yr2", "R_005_N_Sid_yr2", "R_011_N_Sid_yr2", "R_014_N_Sid_yr2", "R_017_N_Sid_yr2", "R_020_N_Sid_yr2", "R_029_N_Sid_yr2", "R_026_N_Sid_yr2")
symbiont_means_filtered <- symbiont_means_filtered[,!(names(symbiont_means_filtered) %in% drop)]
meta <- meta[!(row.names(meta) %in% drop),]
symbiont_means_filtered <- droplevels(symbiont_means_filtered)
meta <- droplevels(meta)
Although RNA-seq technology has improved the dynamic range of gene expression quantification, low-expression genes may be indistinguishable from sampling noise. The presence of noisy, low-expression genes can decrease the sensitivity of detecting DEGs. Thus, identification and filtering of these low-expression genes may improve DEG detection sensitivity.
Genes with very low counts across all samples provide little evidence for differential expression and they interfere with some of the statistical approximations that are used later in the pipeline. They also add to the multiple testing burden when estimating false discovery rates, reducing power to detect differentially expressed genes. These genes should be filtered out prior to further analysis.
For RNA-seq summarized to counts, suggestion to filter on absolute counts, since one could make the argument that genes with low observed counts are probably not really expressed, or that their expression cannot be reliably measured. Furthermore, low count data (which usually also contains many zeros) are not really suitable for a correlation analysis. This simple approach works fine for bulk (tissue) sequencing data but be very careful about applying it to single-cell sequencing where many interesting genes may have zero counts in most cells, and relatively low counts in the rest.
boxplot(log2(symbiont_means_filtered + epsilon), col=meta$group, pch=".",
horizontal=TRUE, cex.axis=0.5, las=1,
xlab="log2(Counts +1)")
prop.null <- apply(symbiont_means_filtered, 2, function(x) 100*mean(x==0))
barplot(prop.null, main="Percentage of null counts per sample",
horiz=TRUE, cex.names=0.5, las=1,
col=meta$group, ylab='Samples', xlab='% of null counts')
Note: 3 samples have >85% of null (zero) counts
prop.null[order(prop.null)]
## R_012_C_Sid_yr2 R_015_C_Sid_yr2 R_021_C_Sid_yr2 R_019_La_Sid_yr2
## 2.633524 5.544262 6.717797 6.856404
## PC_372_La_Sid_yr2 R_010_La_Sid_yr2 R_004_La_Sid_yr2 R_027_C_Sid_yr2
## 7.022731 7.041212 7.096655 7.429311
## R_009_C_Sid_yr2 R_022_La_Sid_yr2 R_003_C_Sid_yr2 PC_371_C_Sid_yr2
## 7.438551 7.771207 8.177786 10.256884
## R_025_La_Sid_yr2 R_024_C_Sid_yr2 R_013_La_Sid_yr2 R_018_C_Sid_yr2
## 21.437812 26.335243 60.293846 60.543338
## PC_350_La_Sid_yr2 R_030_C_Sid_yr2 PO_344_La_Sid_yr2 PC_374_C_Sid_yr2
## 66.346332 68.582517 69.497320 70.735539
## PC_349_C_Sid_yr2 PC_375_La_Sid_yr2 PO_347_La_Sid_yr2 PO_332_La_Sid_yr2
## 70.948069 71.511735 72.306413 72.435779
## PO_338_La_Sid_yr2 PC_366_La_Sid_yr2 PO_343_C_Sid_yr2 PC_368_C_Sid_yr2
## 75.235631 75.318795 76.483090 77.046757
## PO_335_La_Sid_yr2 PO_334_C_Sid_yr2 PO_341_La_Sid_yr2 PC_365_C_Sid_yr2
## 77.074478 77.749030 78.284975 79.523193
## PO_337_C_Sid_yr2 R_031_La_Sid_yr2 PC_362_C_Sid_yr2 PO_340_C_Sid_yr2
## 80.317871 80.428756 82.332286 85.243023
## PC_363_La_Sid_yr2 PO_331_C_Sid_yr2
## 87.091111 87.091111
Remove sample(s) with >85% of null/zero counts
drop <- c("PO_340_C_Sid_yr2", "PC_363_La_Sid_yr2", "PO_331_C_Sid_yr2")
symbiont_means_filtered <- symbiont_means_filtered[,!(names(symbiont_means_filtered) %in% drop)]
meta <- meta[!(row.names(meta) %in% drop),]
prop.null <- apply(symbiont_means_filtered, 2, function(x) 100*mean(x==0))
prop.null[order(prop.null)]
## R_012_C_Sid_yr2 R_015_C_Sid_yr2 R_021_C_Sid_yr2 R_019_La_Sid_yr2
## 2.633524 5.544262 6.717797 6.856404
## PC_372_La_Sid_yr2 R_010_La_Sid_yr2 R_004_La_Sid_yr2 R_027_C_Sid_yr2
## 7.022731 7.041212 7.096655 7.429311
## R_009_C_Sid_yr2 R_022_La_Sid_yr2 R_003_C_Sid_yr2 PC_371_C_Sid_yr2
## 7.438551 7.771207 8.177786 10.256884
## R_025_La_Sid_yr2 R_024_C_Sid_yr2 R_013_La_Sid_yr2 R_018_C_Sid_yr2
## 21.437812 26.335243 60.293846 60.543338
## PC_350_La_Sid_yr2 R_030_C_Sid_yr2 PO_344_La_Sid_yr2 PC_374_C_Sid_yr2
## 66.346332 68.582517 69.497320 70.735539
## PC_349_C_Sid_yr2 PC_375_La_Sid_yr2 PO_347_La_Sid_yr2 PO_332_La_Sid_yr2
## 70.948069 71.511735 72.306413 72.435779
## PO_338_La_Sid_yr2 PC_366_La_Sid_yr2 PO_343_C_Sid_yr2 PC_368_C_Sid_yr2
## 75.235631 75.318795 76.483090 77.046757
## PO_335_La_Sid_yr2 PO_334_C_Sid_yr2 PO_341_La_Sid_yr2 PC_365_C_Sid_yr2
## 77.074478 77.749030 78.284975 79.523193
## PO_337_C_Sid_yr2 R_031_La_Sid_yr2 PC_362_C_Sid_yr2
## 80.317871 80.428756 82.332286
prop.null <- apply(symbiont_means_filtered, 2, function(x) 100*mean(x==0))
barplot(prop.null, main="Percentage of null counts per sample",
horiz=TRUE, cex.names=0.5, las=1,
col=meta$group, ylab='Samples', xlab='% of null counts')
meta %>%
group_by(group) %>%
dplyr::summarise(count = n())
## # A tibble: 6 x 2
## group count
## * <fct> <int>
## 1 Lagoon.Control 6
## 2 Lagoon.Ojo 4
## 3 Ojo.Control 3
## 4 Ojo.Ojo 6
## 5 Reef.Control 9
## 6 Reef.Ojo 7
To determine the appropriate statistical model, we need information about the distribution of counts.
These images illustrate some common features of RNA-seq count data, including - a low number of counts associated with a large proportion of genes, and - a long right tail due to the lack of any upper limit for expression.
# To get an idea about how RNA-seq counts are distributed, plot counts for a couple samples
p1 <- ggplot(symbiont_means_filtered) +
geom_histogram(aes(x = PC_349_C_Sid_yr2), stat = "bin", bins = 200) +
xlim(-5, 500) +
xlab("Raw expression counts") +
ylab("Number of genes")
p2 <- ggplot(symbiont_means_filtered) +
geom_histogram(aes(x = R_025_La_Sid_yr2), stat = "bin", bins = 200) +
xlim(-5, 500) +
xlab("Raw expression counts") +
ylab("Number of genes")
grid.arrange(p1, p2, ncol = 2)
## Warning: Removed 6 rows containing non-finite values (stat_bin).
## Warning: Removed 2 rows containing missing values (geom_bar).
## Warning: Removed 9 rows containing non-finite values (stat_bin).
## Warning: Removed 2 rows containing missing values (geom_bar).
Count data is often modeled using the binomial distribution, which can give you the probability of getting a number of heads upon tossing a coin a number of times. However, not all count data can be fit with the binomial distribution. The binomial is based on discrete events and used in situations when you have a certain number of cases.
When the number of cases is very large (i.e. people who buy lottery tickets), but the probability of an event is very small (probability of winning), the Poisson distribution is used to model these types of count data. The Poisson is similar to the binomial, but is based on continuous events. Details provided by Rafael Irizarry in the EdX class.
With RNA-Seq data, a very large number of RNAs are represented and the probability of pulling out a particular transcript is very small. Thus, it would be an appropriate situation to use the Poisson distribution. However, a unique property of this distribution is that the mean == variance. Realistically, with RNA-Seq data there is always some biological variation present across the replicates (within a sample class). Genes with larger average expression levels will tend to have larger observed variances across replicates.
If the proportions of mRNA stayed exactly constant between the biological replicates for each sample class, we could expect Poisson distribution (where mean == variance). A nice description of this concept is presented by Rafael Irizarry in the EdX class. But this doesn’t happen in practice, and so the Poisson distribution is only considered appropriate for a single biological sample.
The model that fits best, given this type of variability between replicates, is the Negative Binomial (NB) model. Essentially, the NB model is a good approximation for data where the mean < variance, as is the case with RNA-Seq count data.
Remember for the Poisson model, mean = variance, but for Negative Binomial, mean < variance
mean_counts <- apply(symbiont_means_filtered, 1, mean)
variance_counts <- apply(symbiont_means_filtered, 1, var)
df <- data.frame(mean_counts, variance_counts)
ggplot(df) +
geom_point(aes(x=mean_counts, y=variance_counts)) +
geom_line(aes(x=mean_counts, y=mean_counts, color="red")) +
scale_y_log10() +
scale_x_log10() +
theme(legend.position = "none")
Note that in the above figure, the variance across replicates tends to be greater than the mean (red line), especially for genes with large mean expression levels. This is a good indication that our data do not fit the Poisson distribution and we need to account for this increase in variance using the Negative Binomial model (i.e. Poisson will underestimate variability leading to an increase in false positive DE genes).
To accurately model sequencing counts, we need to generate accurate estimates of within-group variation (variation between replicates of the same sample group) for each gene. With only a few (3-6) replicates per group, the estimates of variation for each gene are often unreliable (due to the large differences in dispersion for genes with similar means).
To address this problem, DESeq2 shares information across genes to generate more accurate estimates of variation based on the mean expression level of the gene using a method called ‘shrinkage’. DESeq2 assumes that genes with similar expression levels have similar dispersion.
Estimating the dispersion for each gene separately: To model the dispersion based on expression level (mean counts of replicates), the dispersion for each gene is estimated using maximum likelihood estimation. In other words, given the count values of the replicates, the most likely estimate of dispersion is calculated.
Ensure the row names of the metadata dataframe are present and in the same order as the column names of the counts dataframe.
To create the object we will need the count matrix and the metadata table as input. We will also need to specify a design formula. The design formula specifies the column(s) in the metadata table and how they should be used in the analysis.
# Check that sample names match in both files
all(colnames(symbiont_means_filtered) %in% rownames(meta))
## [1] TRUE
all(colnames(symbiont_means_filtered) == rownames(meta))
## [1] TRUE
# If your data did not match, you could use the match() function to rearrange them to be matching.
levels(meta$group)
## [1] "Lagoon.Control" "Lagoon.Ojo" "Ojo.Control" "Ojo.Ojo"
## [5] "Reef.Control" "Reef.Ojo"
# can only do 2 group comparisons (limitation of DESeq2)
dds <- DESeqDataSetFromMatrix(countData = symbiont_means_filtered,
colData = meta,
design= ~group)
Note that neither rlog transformation nor the VST are used by the differential expression estimation in DESeq, which always occurs on the raw count data, through generalized linear modeling which incorporates knowledge of the variance-mean dependence. The rlog transformation and VST are offered as separate functionality which can be used for visualization, clustering or other machine learning tasks.
In order to test for differential expression, we operate on raw counts and use discrete distributions as described in the previous section on differential expression. However for other downstream analyses – e.g. for visualization or clustering – it is useful to work with transformed versions of the count data.
Both transformations produce transformed data on the log2 scale which has been normalized with respect to library size or other normalization factors.
The point of these two transformations, the VST and the rlog, is to remove the dependence of the variance on the mean, particularly the high variance of the logarithm of count data when the mean is low. In particular, genes with low expression level and therefore low read counts tend to have high variance, which is not removed efficiently by the ordinary logarithmic transformation.
Both VST and rlog use the experiment-wide trend of variance over mean, in order to transform the data to remove the experiment-wide trend. Note that we do not require or desire that all the genes have exactly the same variance after transformation. Indeed, in a figure below, you will see that after the transformations the genes with the same mean do not have exactly the same standard deviations, but that the experiment-wide trend has flattened. It is those genes with row variance above the trend which will allow us to cluster samples into interesting groups.
If many of genes have large differences in counts due to the experimental design, it is important to set blind=FALSE for downstream analysis.
Blind dispersion estimation is not the appropriate choice if one expects that many or the majority of genes (rows) will have large differences in counts which are explainable by the experimental design, and one wishes to transform the data for downstream analysis. In this case, using blind dispersion estimation will lead to large estimates of dispersion, as it attributes differences due to experimental design as unwanted noise, and will result in overly shrinking the transformed values towards each other. By setting blind to FALSE, the dispersions already estimated will be used to perform transformations, or if not present, they will be estimated using the current design formula. Note that only the fitted dispersion estimates from mean-dispersion trend line are used in the transformation (the global dependence of dispersion on mean for the entire experiment). So setting blind to FALSE is still for the most part not using the information about which samples were in which experimental group in applying the transformation.
This function calculates a variance stabilizing transformation (VST) from the fitted dispersion mean relation(s) and then transforms the count data (normalized by division by the size factors or normalization factors), yielding a matrix of values which are now approximately homoskedastic (having constant variance along the range of mean values). The transformation also normalizes with respect to library size.
The rlog is less sensitive to size factors, which can be an issue when size factors vary widely. These transformations are useful when checking for outliers or as input for machine learning techniques such as clustering or linear discriminant analysis.
The transformed data should be approximated variance stabilized and also includes correction for size factors or normalization factors. The transformed data is on the log2 scale for large counts.
Limitations: In order to preserve normalization, the same transformation has to be used for all samples. This results in the variance stabilizition to be only approximate. The more the size factors differ, the more residual dependence of the variance on the mean will be found in the transformed data. rlog is a transformation which can perform better in these cases. As shown in the vignette, the function meanSdPlot from the package vsn can be used to see whether this is a problem.
NB! vst() is a wrapper for the varianceStabilizingTransformation() vst() provides much faster estimation of the dispersion trend used to determine the formula for the VST. The speed-up is accomplished by subsetting to a smaller number of genes in order to estimate this dispersion trend. The subset of genes is chosen deterministically, to span the range of genes’ mean normalized count. This wrapper for the VST is not blind to the experimental design: the sample covariate information is used to estimate the global trend of genes’ dispersion values over the genes’ mean normalized count. It can be made strictly blind to experimental design by first assigning a design of ~1 before running this function, or by avoiding subsetting and using varianceStabilizingTransformation.
In case dispersions have not yet been estimated for object, this parameter is passed on to estimateDispersions().
If estimateDispersions was called with:
- fitType="parametric", a closed-form expression for the variance stabilizing transformation is used on the normalized count data.
- fitType="local", the reciprocal of the square root of the variance of the normalized counts, as derived from the dispersion fit, is then numerically integrated, and the integral (approximated by a spline function) is evaluated for each count value in the column, yielding a transformed value.
- fitType="mean", a VST is applied for Negative Binomial distributed counts, ’k’, with a fixed dispersion, ’a’: ( 2 asinh(sqrt(a k)) - log(a) - log(4) )/log(2).
In all cases, the transformation is scaled such that for large counts, it becomes asymptotically (for large values) equal to the logarithm to base 2 of normalized counts.
vsdBlindTrue <- DESeq2::varianceStabilizingTransformation(dds, blind = TRUE, fitType = "parametric")
## -- note: fitType='parametric', but the dispersion trend was not well captured by the
## function: y = a/x + b, and a local regression fit was automatically substituted.
## specify fitType='local' or 'mean' to avoid this message next time.
boxplot(assay(vsdBlindTrue), col=meta$group)
vsdBlindFalse <- DESeq2::varianceStabilizingTransformation(dds, blind = FALSE, fitType = "parametric")
## -- note: fitType='parametric', but the dispersion trend was not well captured by the
## function: y = a/x + b, and a local regression fit was automatically substituted.
## specify fitType='local' or 'mean' to avoid this message next time.
boxplot(assay(vsdBlindFalse), col=meta$group)
Useful for various unsupervised clustering analyses.
Warning message (but not an error): “rlog() may take a few minutes with 30 or more samples, vst() is a much faster transformation”
rlogged.BlindTrue = rlogTransformation(dds, blind = TRUE)
boxplot(assay(rlogged.BlindTrue), col=meta$group)
rlogged.BlindFalse = rlogTransformation(dds, blind = FALSE)
boxplot(assay(rlogged.BlindFalse), col=meta$group)
The figure below plots the standard deviation of the transformed data, across samples, against the mean, using the shifted logarithm transformation, the regularized log transformation and the variance stabilizing transformation. The shifted logarithm has elevated standard deviation in the lower count range, and the regularized log to a lesser extent, while for the variance stabilized data the standard deviation is roughly constant along the whole dynamic range.
Note that the vertical axis in such plots is the square root of the variance over all samples, so including the variance due to the experimental conditions. While a flat curve of the square root of variance over the mean may seem like the goal of such transformations, this may be unreasonable in the case of datasets with many true differences due to the experimental conditions.
The scatterplot of these versus each other allows you to visually verify whether there is a dependence of the standard deviation (or variance) on the mean. The red line depicts the running median estimator (window-width 10%). If there is no variance-mean dependence, then the line should be approximately horizontal.
# this gives log2(n + 1)
ntd <- normTransform(dds)
meanSdPlot(assay(ntd))
meanSdPlot(assay(vsdBlindTrue))
meanSdPlot(assay(vsdBlindFalse))
meanSdPlot(assay(rlogged.BlindTrue))
meanSdPlot(assay(rlogged.BlindFalse))
dists.rlog.BlindFalse <- dist(t(assay(rlogged.BlindFalse)))
plot(hclust(dists.rlog.BlindFalse))
dists.vsdBlindFalse <- dist(t(assay(vsdBlindFalse)))
plot(hclust(dists.vsdBlindFalse))
The input file has to contain all the genes, not just differentially expressed ones. Note that you can use the resulting transformed values only for visualization and clustering (not for differential expression analysis which needs raw counts)
# save varianceStabilizingTransformation counts
saveRDS(vsdBlindFalse, file = "counts_vst.BlindFalse_Sid_Symbiont.rds")
# later, after you already created the file, can load the existing file:
#vsdBlindFalse <- readRDS("counts_vst.BlindFalse_Sid_Symbiont.rds")
To perform the median of ratios method of normalization, DESeq2 has a single estimateSizeFactors() function that will generate size factors for us. We will use the function in the example below, but in a typical RNA-seq analysis this step is automatically performed by the DESeq2() function.
dds <- estimateSizeFactors(dds)
plot(sort(sizeFactors(dds)))
#sort.list(sizeFactors(dds))
sizeFactors(dds)
## PC_349_C_Sid_yr2 PC_350_La_Sid_yr2 PC_362_C_Sid_yr2 PC_365_C_Sid_yr2
## 0.3380383 0.2558145 0.3382474 0.5335437
## PC_366_La_Sid_yr2 PC_368_C_Sid_yr2 PC_371_C_Sid_yr2 PC_372_La_Sid_yr2
## 0.8328818 0.5629804 4.2676224 6.0200141
## PC_374_C_Sid_yr2 PC_375_La_Sid_yr2 PO_332_La_Sid_yr2 PO_334_C_Sid_yr2
## 0.4637911 0.6622722 0.6508436 0.5227593
## PO_335_La_Sid_yr2 PO_337_C_Sid_yr2 PO_338_La_Sid_yr2 PO_341_La_Sid_yr2
## 0.3145150 0.5305822 0.1349094 0.4967547
## PO_343_C_Sid_yr2 PO_344_La_Sid_yr2 PO_347_La_Sid_yr2 R_003_C_Sid_yr2
## 0.4336000 0.4616256 1.0791889 3.2334274
## R_004_La_Sid_yr2 R_009_C_Sid_yr2 R_010_La_Sid_yr2 R_012_C_Sid_yr2
## 4.5598678 3.4599872 4.4672224 1.8665077
## R_013_La_Sid_yr2 R_015_C_Sid_yr2 R_018_C_Sid_yr2 R_019_La_Sid_yr2
## 0.7715778 3.6978293 0.2025338 4.4176576
## R_021_C_Sid_yr2 R_022_La_Sid_yr2 R_024_C_Sid_yr2 R_025_La_Sid_yr2
## 4.8677246 4.4218271 4.6321761 2.6306375
## R_027_C_Sid_yr2 R_030_C_Sid_yr2 R_031_La_Sid_yr2
## 3.1885012 0.5708659 0.1813318
order(sizeFactors(dds), decreasing = TRUE)
## [1] 8 29 31 21 23 30 28 7 26 22 20 33 32 24 19 5 25 10 11 34 6 4 14 12 16
## [26] 9 18 17 3 1 13 2 27 35 15
NOTE: DESeq2 doesn’t actually use normalized counts, rather it uses the raw counts and models the normalization inside the Generalized Linear Model (GLM).
These normalized counts will be useful for downstream visualization of results, but cannot be used as input to DESeq2 or any other tools that perform differential expression analysis which use the negative binomial model.
# to retrieve the normalized counts matrix from dds
normalized_counts <- counts(dds, normalized=TRUE)
rownames(normalized_counts) <- rownames(symbiont_means_filtered)
# Save normalized counts table as R data file for later use
saveRDS(normalized_counts, file = "counts_normalized_Sid_Symbiont.rds")
normalized_counts <- as.data.frame.array(normalized_counts)
setDT(normalized_counts, keep.rownames = "gene")
# Save normalized data matrix for later use:
write.csv(normalized_counts, "counts_normalized_Sid_Symbiont.csv", row.names = F)
colSums(counts(dds))[order(colSums(counts(dds)))]
## R_018_C_Sid_yr2 PC_350_La_Sid_yr2 PO_338_La_Sid_yr2 PO_335_La_Sid_yr2
## 29019 31524 36445 70034
## PC_349_C_Sid_yr2 PO_343_C_Sid_yr2 PC_362_C_Sid_yr2 PC_374_C_Sid_yr2
## 72930 104596 107596 109029
## PO_344_La_Sid_yr2 PO_332_La_Sid_yr2 PC_368_C_Sid_yr2 PC_375_La_Sid_yr2
## 117599 150744 152007 158629
## PO_347_La_Sid_yr2 PO_341_La_Sid_yr2 PO_337_C_Sid_yr2 R_025_La_Sid_yr2
## 172404 176710 186452 206167
## PC_366_La_Sid_yr2 R_031_La_Sid_yr2 R_030_C_Sid_yr2 PO_334_C_Sid_yr2
## 211834 248450 256703 278683
## R_013_La_Sid_yr2 PC_365_C_Sid_yr2 R_003_C_Sid_yr2 R_027_C_Sid_yr2
## 287619 293396 306907 338108
## PC_371_C_Sid_yr2 R_012_C_Sid_yr2 R_009_C_Sid_yr2 R_015_C_Sid_yr2
## 353687 371141 375937 392854
## R_022_La_Sid_yr2 R_010_La_Sid_yr2 R_004_La_Sid_yr2 R_019_La_Sid_yr2
## 417389 460167 479084 492226
## R_021_C_Sid_yr2 PC_372_La_Sid_yr2 R_024_C_Sid_yr2
## 552835 557126 775066
colSums(counts(dds)) %>% barplot(col=meta$group)
# How do the numbers correlate with the size factor?
# Now take a look at the total depth after normalization using:
#colSums(counts(dds, normalized=T))
colSums(counts(dds, normalized=T))[order(colSums(counts(dds, normalized=T)))]
## R_025_La_Sid_yr2 PC_371_C_Sid_yr2 PC_372_La_Sid_yr2 R_022_La_Sid_yr2
## 78371.50 82876.83 92545.63 94392.88
## R_003_C_Sid_yr2 R_010_La_Sid_yr2 R_004_La_Sid_yr2 R_027_C_Sid_yr2
## 94916.93 103009.65 105065.33 106039.79
## R_015_C_Sid_yr2 R_009_C_Sid_yr2 R_019_La_Sid_yr2 R_021_C_Sid_yr2
## 106239.09 108652.72 111422.40 113571.54
## PC_350_La_Sid_yr2 R_018_C_Sid_yr2 PO_347_La_Sid_yr2 R_024_C_Sid_yr2
## 123229.92 143279.78 159753.32 167322.22
## R_012_C_Sid_yr2 PC_349_C_Sid_yr2 PO_335_La_Sid_yr2 PO_332_La_Sid_yr2
## 198842.47 215744.80 222673.01 231613.26
## PC_374_C_Sid_yr2 PC_375_La_Sid_yr2 PO_343_C_Sid_yr2 PC_366_La_Sid_yr2
## 235082.14 239522.35 241226.93 254338.62
## PO_344_La_Sid_yr2 PC_368_C_Sid_yr2 PO_338_La_Sid_yr2 PC_362_C_Sid_yr2
## 254749.76 270004.07 270144.19 318098.51
## PO_337_C_Sid_yr2 PO_341_La_Sid_yr2 R_013_La_Sid_yr2 R_030_C_Sid_yr2
## 351410.22 355728.86 372767.33 449673.05
## PO_334_C_Sid_yr2 PC_365_C_Sid_yr2 R_031_La_Sid_yr2
## 533099.99 549900.62 1370140.58
colSums(counts(dds, normalized = T)) %>% barplot(col=meta$group)
A useful initial step in an RNA-seq analysis is often to assess overall similarity between samples:
To explore the similarity of our samples, we will be performing sample-level QC using Principal Component Analysis (PCA) and hierarchical clustering methods. Our sample-level QC allows us to see how well our replicates cluster together, as well as, observe whether our experimental condition represents the major source of variation in the data. Performing sample-level QC can also identify any sample outliers, which may need to be explored further to determine whether they need to be removed prior to DE analysis.
When using these unsupervised clustering methods, log2-transformation of the normalized counts improves the distances/clustering for visualization. DESeq2 uses a regularized log transform (rlog) of the normalized counts for sample-level QC as it moderates the variance across the mean, improving the clustering.
Here I use varianceStablizingTransformation() transformed counts
Exploratory
By default the function uses the top 500 most variable genes. You can change this by adding the ntop argument and specifying how many genes you want to use to draw the plot.
pcadata = DESeq2::plotPCA(vsdBlindFalse, intgroup = c( "group"), returnData = TRUE)
percentVar = round(100 * attr(pcadata, "percentVar"))
pca = prcomp(t(assay(vsdBlindFalse)), center = TRUE, scale. = FALSE)
DESeq2::plotPCA(vsdBlindFalse, returnData = TRUE, intgroup = c("group")) %>%
ggplot(aes(x = PC1, y = PC2)) +
geom_point(aes(colour = group), size = 2) +
stat_ellipse(geom = "polygon", alpha = 1/10, aes(fill = group)) +
xlab(paste0("PC1: ",percentVar[1],"% variance")) +
ylab(paste0("PC2: ",percentVar[2],"% variance")) +
theme_cowplot()
## Too few points to calculate an ellipse
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
pcadata = DESeq2::plotPCA(vsdBlindFalse, intgroup = c("Origin_name"), returnData = TRUE)
percentVar = round(100 * attr(pcadata, "percentVar"))
pca = prcomp(t(assay(vsdBlindFalse)), center = TRUE, scale. = FALSE)
DESeq2::plotPCA(vsdBlindFalse, returnData = TRUE, intgroup = c("Origin_name") ) %>%
ggplot(aes(x = PC1, y = PC2)) +
geom_point(aes(colour = Origin_name), size = 2) +
stat_ellipse(geom = "polygon", alpha = 1/10, aes(fill = Origin_name)) +
xlab(paste0("PC1: ",percentVar[1],"% variance")) +
ylab(paste0("PC2: ",percentVar[2],"% variance")) +
theme_cowplot()
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
pcadata = DESeq2::plotPCA(vsdBlindFalse, intgroup = c("Destination_name"), returnData = TRUE)
percentVar = round(100 * attr(pcadata, "percentVar"))
pca = prcomp(t(assay(vsdBlindFalse)), center = TRUE, scale. = FALSE)
DESeq2::plotPCA(vsdBlindFalse, returnData = TRUE, intgroup = c("Destination_name") ) %>%
ggplot(aes(x = PC1, y = PC2)) +
geom_point(aes(colour = Destination_name), size = 2) +
stat_ellipse(geom = "polygon", alpha = 1/10, aes(fill = Destination_name)) +
xlab(paste0("PC1: ",percentVar[1],"% variance")) +
ylab(paste0("PC2: ",percentVar[2],"% variance")) +
theme_cowplot()
pcadata = DESeq2::plotPCA(vsdBlindFalse, intgroup = c("Colony_ID"), returnData = TRUE)
percentVar = round(100 * attr(pcadata, "percentVar"))
pca = prcomp(t(assay(vsdBlindFalse)), center = TRUE, scale. = FALSE)
DESeq2::plotPCA(vsdBlindFalse, returnData = TRUE, intgroup = c("Colony_ID")) %>%
ggplot(aes(x = PC1, y = PC2)) +
geom_point(aes(colour = Colony_ID), size = 2) +
xlab(paste0("PC1: ",percentVar[1],"% variance")) +
ylab(paste0("PC2: ",percentVar[2],"% variance")) +
theme_cowplot()
Since there is no built-in function for heatmaps in DESeq2 we will be using the pheatmap() function from the pheatmap package. This function requires a matrix/dataframe of numeric values as input, and so the first thing we need to is retrieve that information from the rld object:
rld_mat <- assay(vsdBlindFalse)
rld_cor <- cor(rld_mat)
#head(rld_cor) ## check the output of cor(), make note of the rownames and colnames
pheatmap(rld_cor)
This figure takes overall expression and clusters samples based on euclidean distances.
# sample by distance heatmap
sampleDists <- as.matrix(dist(t(assay(vsdBlindFalse))))
heatmap.2(as.matrix(sampleDists), key=F, trace="none",
col=colorpanel(100, "black", "white"),
margin=c(10, 10))
In addition to examining how well the samples/replicates cluster together, there are a few more QC steps. Prior to differential expression analysis it is beneficial to omit genes that have little or no chance of being detected as differentially expressed. This will increase the power to detect differentially expressed genes. The genes omitted fall into three categories:
- Genes with zero counts in all samples
- Genes with an extreme count outlier
- Genes with a low mean normalized counts (independent filtering)
DESeq2 will perform this filtering by default; however other DE tools, such as EdgeR will not. Filtering is a necessary step, even if you are using limma-voom and/or edgeR’s quasi-likelihood methods. Be sure to follow pre-filtering steps when using other tools, as outlined in their user guides found on Bioconductor as they generally perform much better.
Function performs a default analysis through the steps: - estimation of size factors: estimateSizeFactors - estimation of dispersion: estimateDispersions - Negative Binomial GLM fitting and Wald statistics: nbinomWaldTest
In this step we essentially want to determine whether the mean expression levels of different sample groups are significantly different.
Wald test: the shrunken estimate of LFC is divided by its standard error, resulting in a z-statistic, which is compared to a standard normal distribution.
In DESeq2, we assume that genes of similar average expression strength have similar dispersion. https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0550-8
a dispersion-mean relation of the form:
dispersion = asymptDisp + extraPois / mean
coefficients asymptDisp and extraPois are given in the attribute coefficients of the dispersionFunction of the object.
# Generate normalized counts
# perform the median of ratios method of normalization
# perform a Wald test in DESeq2 pairwise comparison between treatment effects
dds <- DESeq2::DESeq(dds, fitType = "parametric", test = "Wald")
## using pre-existing size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## -- note: fitType='parametric', but the dispersion trend was not well captured by the
## function: y = a/x + b, and a local regression fit was automatically substituted.
## specify fitType='local' or 'mean' to avoid this message next time.
## final dispersion estimates
## fitting model and testing
## -- replacing outliers and refitting for 222 genes
## -- DESeq argument 'minReplicatesForReplace' = 7
## -- original counts are preserved in counts(dds)
## estimating dispersions
## fitting model and testing
# copy here text from below:
# -- replacing outliers and refitting for 222 genes
# -- DESeq argument 'minReplicatesForReplace' = 7
# -- original counts are preserved in counts(dds)
– note: fitType=‘parametric’, but the dispersion trend was not well captured by the function: y = a/x + b, and a local regression fit was automatically substituted.
results.dds = results(dds)
head(results.dds)
## log2 fold change (MLE): group Reef.Ojo vs Lagoon.Control
## Wald test p-value: group Reef.Ojo vs Lagoon.Control
## DataFrame with 6 rows and 6 columns
## baseMean log2FoldChange
## <numeric> <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont 3.680556 4.06706
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 8.693239 3.62464
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont 0.989918 -1.92308
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 59.846104 -2.64835
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 11.872667 4.89067
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 5.786528 4.85779
## lfcSE stat pvalue
## <numeric> <numeric> <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont 4.81871 0.844015 0.3986613
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 2.37949 1.523281 0.1276884
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont 4.87396 -0.394562 0.6931665
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 1.05471 -2.510977 0.0120398
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 4.81271 1.016199 0.3095348
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 4.81614 1.008647 0.3131441
## padj
## <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont 0.549396
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 0.416137
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont NA
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 0.234825
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 0.508337
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 0.510263
dispersionFunction(dds)
## function (means)
## exp(predict(fit, data.frame(logMeans = log(means))))
## <bytecode: 0x7fd6d227bf00>
## <environment: 0x7fd6db60a2c8>
## attr(,"fitType")
## [1] "local"
## attr(,"varLogDispEsts")
## [1] 0.3599872
## attr(,"dispPriorVar")
## [1] 0.288589
This curve is displayed as a red line in the figure below, which plots the estimate for the expected dispersion value for genes of a given expression strength. Each black dot is a gene with an associated mean expression level and maximum likelihood estimation (MLE) of the dispersion
The curve allows for more accurate identification of differentially expressed genes when sample sizes are small, and the strength of the shrinkage for each gene depends on :
- how close gene dispersions are from the curve
- sample size (more samples = less shrinkage)
This shrinkage method is particularly important to reduce false positives in the differential expression analysis. Genes with low dispersion estimates are shrunken towards the curve, and the more accurate, higher shrunken values are output for fitting of the model and differential expression testing.
Dispersion estimates that are slightly above the curve are also shrunk toward the curve for better dispersion estimation; however, genes with extremely high dispersion values are not. This is due to the likelihood that the gene does not follow the modeling assumptions and has higher variability than others for biological or technical reasons [1]. Shrinking the values toward the curve could result in false positives, so these values are not shrunken. These genes are shown surrounded by blue circles below.
This is a good plot to examine to ensure your data is a good fit for the DESeq2 model. You expect your data to generally scatter around the curve, with the dispersion decreasing with increasing mean expression levels. If you see a cloud or different shapes, then you might want to explore your data more to see if you have contamination (mitochondrial, etc.) or outlier samples. Note how much shrinkage you get across the whole range of means in the plotDispEsts() plot for any experiment with low degrees of freedom.
NB! “parametric” is the default, unless the fit fails, then automatically will be replaced with another fitType (if applicable, will see this in Warning message after running DESeq() function above)
# already accomplished as part of DESeq() function (above):
#dds <- DESeq2::estimateSizeFactors(dds)
#dds <- DESeq2::estimateDispersionsGeneEst(dds)
#dds <- DESeq2::estimateDispersionsFit(dds)
#dds <- DESeq2::estimateDispersionsMAP(dds)
DESeq2::plotDispEsts(dds)
head(dispersions(dds))
## [1] 35.000000 8.101229 35.000000 1.683780 35.000000 35.000000
All of the intermediate values (gene-wise dispersion estimates, fitted dispersion estimates from the trended fit, etc.) are stored in mcols(dds), with information about these columns in mcols(mcols(dds)).
rownames(mcols(mcols(dds)))
## [1] "baseMean"
## [2] "baseVar"
## [3] "allZero"
## [4] "dispGeneEst"
## [5] "dispGeneIter"
## [6] "dispFit"
## [7] "dispersion"
## [8] "dispIter"
## [9] "dispOutlier"
## [10] "dispMAP"
## [11] "Intercept"
## [12] "group_Lagoon.Ojo_vs_Lagoon.Control"
## [13] "group_Ojo.Control_vs_Lagoon.Control"
## [14] "group_Ojo.Ojo_vs_Lagoon.Control"
## [15] "group_Reef.Control_vs_Lagoon.Control"
## [16] "group_Reef.Ojo_vs_Lagoon.Control"
## [17] "SE_Intercept"
## [18] "SE_group_Lagoon.Ojo_vs_Lagoon.Control"
## [19] "SE_group_Ojo.Control_vs_Lagoon.Control"
## [20] "SE_group_Ojo.Ojo_vs_Lagoon.Control"
## [21] "SE_group_Reef.Control_vs_Lagoon.Control"
## [22] "SE_group_Reef.Ojo_vs_Lagoon.Control"
## [23] "WaldStatistic_Intercept"
## [24] "WaldStatistic_group_Lagoon.Ojo_vs_Lagoon.Control"
## [25] "WaldStatistic_group_Ojo.Control_vs_Lagoon.Control"
## [26] "WaldStatistic_group_Ojo.Ojo_vs_Lagoon.Control"
## [27] "WaldStatistic_group_Reef.Control_vs_Lagoon.Control"
## [28] "WaldStatistic_group_Reef.Ojo_vs_Lagoon.Control"
## [29] "WaldPvalue_Intercept"
## [30] "WaldPvalue_group_Lagoon.Ojo_vs_Lagoon.Control"
## [31] "WaldPvalue_group_Ojo.Control_vs_Lagoon.Control"
## [32] "WaldPvalue_group_Ojo.Ojo_vs_Lagoon.Control"
## [33] "WaldPvalue_group_Reef.Control_vs_Lagoon.Control"
## [34] "WaldPvalue_group_Reef.Ojo_vs_Lagoon.Control"
## [35] "betaConv"
## [36] "betaIter"
## [37] "deviance"
## [38] "maxCooks"
## [39] "replace"
dispersions(dds) %>% hist(breaks = 500)
Evaluate contrasts
# It can be useful to include the sample names in the data set object:
rownames(dds) <- rownames(symbiont_means_filtered)
levels(dds$group)
## [1] "Lagoon.Control" "Lagoon.Ojo" "Ojo.Control" "Ojo.Ojo"
## [5] "Reef.Control" "Reef.Ojo"
Note that the ‘results’ function automatically performs independent filtering based on the mean of normalized counts for each gene, optimizing the number of genes which will have an adjusted p value below a given FDR cutoff, alpha.
In each pairwise contrast, we should have the treatment condition first, and the control condition second in the log2 fold change (MLE) output.
Groups: “Lagoon.Control” “Lagoon.Ojo” “Ojo.Control” “Ojo.Ojo”
“Reef.Control” “Reef.Ojo”
LO_LC “Lagoon.Ojo”, “Lagoon.Control”
This contrast does not make sense “Lagoon.Ojo”, “Ojo.Control”
OO_LO “Ojo.Ojo”, “Lagoon.Ojo”
This contrast does not make sense “Lagoon.Ojo”, “Reef.Control”
LO_RO “Lagoon.Ojo”, “Reef.Ojo”
OO_OC “Ojo.Ojo”, “Ojo.Control”
low pH vs. ambient pH, but not comparative sites OO_RC “Ojo.Ojo”, “Reef.Control”
low pH vs. ambient pH, but not comparative sites OO_LC “Ojo.Ojo”, “Lagoon.Control”
OO_RO “Ojo.Ojo”, “Reef.Ojo”
RO_RC “Reef.Ojo”, “Reef.Control”
This contrast does not make sense “Reef.Ojo”, “Lagoon.Control”
This contrast does not make sense “Reef.Ojo”, “Ojo.Control”
OC_LC “Ojo.Control”, “Lagoon.Control”
OC_RC “Ojo.Control”, “Reef.Control”
LC_RC “Lagoon.Control”, “Reef.Control”
p-adj < 0.1
results_LO_LC_.1 <- results(dds, contrast = c("group", "Lagoon.Ojo", "Lagoon.Control"), alpha = 0.1)
head(results_LO_LC_.1)
## log2 fold change (MLE): group Lagoon.Ojo vs Lagoon.Control
## Wald test p-value: group Lagoon.Ojo vs Lagoon.Control
## DataFrame with 6 rows and 6 columns
## baseMean log2FoldChange
## <numeric> <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont 3.680556 0.000000
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 8.693239 0.428928
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont 0.989918 1.306555
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 59.846104 -4.154071
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 11.872667 -18.831164
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 5.786528 0.000000
## lfcSE stat pvalue
## <numeric> <numeric> <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont 5.64156 0.000000 1.00000000
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 2.80075 0.153148 0.87828194
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont 5.62676 0.232204 0.81637954
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 1.27138 -3.267363 0.00108555
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 5.65577 -3.329551 0.00086986
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 5.64156 0.000000 1.00000000
## padj
## <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont 1.0000000
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 1.0000000
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont NA
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 0.0851462
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 0.0709226
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 1.0000000
summary(results_LO_LC_.1)
##
## out of 10822 with nonzero total read count
## adjusted p-value < 0.1
## LFC > 0 (up) : 3, 0.028%
## LFC < 0 (down) : 57, 0.53%
## outliers [1] : 4, 0.037%
## low counts [2] : 6504, 60%
## (mean count < 3)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
### convert to dataframe (from DESeq2 object)
res_LO_LC_.1_df <- data.frame(results_LO_LC_.1)
DE_LO_LC_padj.1 <- res_LO_LC_.1_df %>%
rownames_to_column('gene') %>%
as_tibble() %>%
filter(padj < 0.1) %>%
arrange(padj)
head(DE_LO_LC_padj.1)
## # A tibble: 6 x 7
## gene baseMean log2FoldChange lfcSE stat pvalue padj
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Cladocopium_Davies_com… 62.6 -21.2 2.98 -7.10 1.27e-12 5.46e-9
## 2 Cladocopium_Davies_com… 95.9 -20.0 3.53 -5.66 1.55e- 8 3.34e-5
## 3 Cladocopium_Davies_com… 12.7 -20.8 3.77 -5.52 3.48e- 8 5.00e-5
## 4 Cladocopium_Davies_com… 27.1 -19.9 3.68 -5.42 5.99e- 8 6.46e-5
## 5 Cladocopium_Davies_com… 21.5 -18.1 3.46 -5.23 1.71e- 7 1.47e-4
## 6 Cladocopium_Davies_com… 14.2 -20.7 4.16 -4.97 6.60e- 7 4.75e-4
# save rlogged and csv file
saveRDS(DE_LO_LC_padj.1, file = "Sid_Symbiont_DE_genes_LO_LC_padj.1.rds")
write.csv(DE_LO_LC_padj.1, "Sid_Symbiont_DE_genes_LO_LC_padj.1.csv")
up.results_LO_LC_padj.1 = row.names(results_LO_LC_.1[results_LO_LC_.1$padj<0.1 & !(is.na(results_LO_LC_.1$padj)) & results_LO_LC_.1$log2FoldChange>0,])
up.results_LO_LC_padj.1 <- as.data.frame(up.results_LO_LC_padj.1)
up.results_LO_LC_padj.1$DEG <- 'Up'
down.results_LO_LC_padj.1 = row.names(results_LO_LC_.1[results_LO_LC_.1$padj<0.1 & !(is.na(results_LO_LC_.1$padj)) & results_LO_LC_.1$log2FoldChange<0,])
down.results_LO_LC_padj.1 <- as.data.frame(down.results_LO_LC_padj.1)
down.results_LO_LC_padj.1$DEG <- 'Down'
LO_LC_padj.1_summary <- merge(up.results_LO_LC_padj.1, down.results_LO_LC_padj.1, all=T)
dim(LO_LC_padj.1_summary)
## [1] 60 3
# save file
write.csv(LO_LC_padj.1_summary, file="Sid_Symbiont_DE_genes_LO_LC_padj.1_summary.csv")
This is later used as input for PERMANOVA
### convert to dataframe (from DESeq2 object)
res_LO_LC_.1_df <- data.frame(results_LO_LC_.1)
DE_LO_LC_pvalue <- res_LO_LC_.1_df %>%
rownames_to_column('gene') %>%
as_tibble() %>%
filter(pvalue < 0.05) %>%
arrange(pvalue)
head(DE_LO_LC_pvalue)
## # A tibble: 6 x 7
## gene baseMean log2FoldChange lfcSE stat pvalue padj
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Cladocopium_Davies_com… 62.6 -21.2 2.98 -7.10 1.27e-12 5.46e-9
## 2 Cladocopium_Davies_com… 95.9 -20.0 3.53 -5.66 1.55e- 8 3.34e-5
## 3 Cladocopium_Davies_com… 12.7 -20.8 3.77 -5.52 3.48e- 8 5.00e-5
## 4 Cladocopium_Davies_com… 27.1 -19.9 3.68 -5.42 5.99e- 8 6.46e-5
## 5 Cladocopium_Davies_com… 21.5 -18.1 3.46 -5.23 1.71e- 7 1.47e-4
## 6 Cladocopium_Davies_com… 14.2 -20.7 4.16 -4.97 6.60e- 7 4.75e-4
# save rlogged and csv file
saveRDS(DE_LO_LC_pvalue, file = "Sid_Symbiont_LO_LC_pval.05.rds")
write.csv(DE_LO_LC_pvalue, "Sid_Symbiont_LO_LC_pval.05.csv")
dim(DE_LO_LC_pvalue)
## [1] 107 7
p-adj < 0.1
results_OO_LO_.1 <- results(dds, contrast = c("group", "Ojo.Ojo", "Lagoon.Ojo"), alpha = 0.1)
head(results_OO_LO_.1)
## log2 fold change (MLE): group Ojo.Ojo vs Lagoon.Ojo
## Wald test p-value: group Ojo.Ojo vs Lagoon.Ojo
## DataFrame with 6 rows and 6 columns
## baseMean log2FoldChange
## <numeric> <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont 3.680556 0.000000
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 8.693239 1.033400
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont 0.989918 -0.456713
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 59.846104 2.471104
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 11.872667 19.729024
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 5.786528 0.000000
## lfcSE stat pvalue
## <numeric> <numeric> <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont 5.64873 0.0000000 1.000000000
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 2.76394 0.3738866 0.708488685
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont 5.63136 -0.0811017 0.935361048
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 1.28132 1.9285628 0.053785165
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 5.65402 3.4893796 0.000484143
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 5.64873 0.0000000 1.000000000
## padj
## <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont NA
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 0.9993167
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont NA
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 0.5157036
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 0.0159291
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 1.0000000
summary(results_OO_LO_.1)
##
## out of 10822 with nonzero total read count
## adjusted p-value < 0.1
## LFC > 0 (up) : 63, 0.58%
## LFC < 0 (down) : 9, 0.083%
## outliers [1] : 4, 0.037%
## low counts [2] : 8811, 81%
## (mean count < 5)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
### convert to dataframe (from DESeq2 object)
res_OO_LO_.1_df <- data.frame(results_OO_LO_.1)
DE_OO_LO_padj.1 <- res_OO_LO_.1_df %>%
rownames_to_column('gene') %>%
as_tibble() %>%
filter(padj < 0.1) %>%
arrange(padj)
head(DE_OO_LO_padj.1)
## # A tibble: 6 x 7
## gene baseMean log2FoldChange lfcSE stat pvalue padj
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Cladocopium_Davies_comp… 25.9 37.3 3.09 12.1 1.52e-33 3.05e-30
## 2 Cladocopium_Davies_comp… 18.4 40.7 5.62 7.25 4.10e-13 4.12e-10
## 3 Cladocopium_Davies_comp… 62.6 21.4 2.99 7.17 7.70e-13 5.15e-10
## 4 Cladocopium_Davies_comp… 95.9 21.0 3.52 5.97 2.39e- 9 1.20e- 6
## 5 Cladocopium_Davies_comp… 15.9 32.8 5.62 5.83 5.62e- 9 1.95e- 6
## 6 Cladocopium_Davies_comp… 11.2 32.7 5.63 5.82 5.83e- 9 1.95e- 6
# save rlogged and csv file
saveRDS(DE_OO_LO_padj.1, file = "Sid_Symbiont_DE_genes_OO_LO_padj.1.rds")
write.csv(DE_OO_LO_padj.1, "Sid_Symbiont_DE_genes_OO_LO_padj.1.csv")
up.results_OO_LO_padj.1 = row.names(results_OO_LO_.1[results_OO_LO_.1$padj<0.1 & !(is.na(results_OO_LO_.1$padj)) & results_OO_LO_.1$log2FoldChange>0,])
up.results_OO_LO_padj.1 <- as.data.frame(up.results_OO_LO_padj.1)
up.results_OO_LO_padj.1$DEG <- 'Up'
down.results_OO_LO_padj.1 = row.names(results_OO_LO_.1[results_OO_LO_.1$padj<0.1 & !(is.na(results_OO_LO_.1$padj)) & results_OO_LO_.1$log2FoldChange<0,])
down.results_OO_LO_padj.1 <- as.data.frame(down.results_OO_LO_padj.1)
down.results_OO_LO_padj.1$DEG <- 'Down'
OO_LO_padj.1_summary <- merge(up.results_OO_LO_padj.1, down.results_OO_LO_padj.1, all=T)
dim(OO_LO_padj.1_summary)
## [1] 72 3
# save file
write.csv(OO_LO_padj.1_summary, file="Sid_Symbiont_DE_genes_OO_LO_padj.1_summary.csv")
This is later used as input for PERMANOVA
DE_OO_LO_pvalue <- res_OO_LO_.1_df %>%
rownames_to_column('gene') %>%
as_tibble() %>%
filter(pvalue < 0.05) %>%
arrange(pvalue)
head(DE_OO_LO_pvalue)
## # A tibble: 6 x 7
## gene baseMean log2FoldChange lfcSE stat pvalue padj
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Cladocopium_Davies_comp… 25.9 37.3 3.09 12.1 1.52e-33 3.05e-30
## 2 Cladocopium_Davies_comp… 18.4 40.7 5.62 7.25 4.10e-13 4.12e-10
## 3 Cladocopium_Davies_comp… 62.6 21.4 2.99 7.17 7.70e-13 5.15e-10
## 4 Cladocopium_Davies_comp… 95.9 21.0 3.52 5.97 2.39e- 9 1.20e- 6
## 5 Cladocopium_Davies_comp… 15.9 32.8 5.62 5.83 5.62e- 9 1.95e- 6
## 6 Cladocopium_Davies_comp… 11.2 32.7 5.63 5.82 5.83e- 9 1.95e- 6
# save rlogged and csv file
saveRDS(DE_OO_LO_pvalue, file = "Sid_Symbiont_OO_LO_pval.05.rds")
write.csv(DE_OO_LO_pvalue, "Sid_Symbiont_OO_LO_pval.05.csv")
dim(DE_OO_LO_pvalue)
## [1] 464 7
p-adj < 0.1
results_LO_RO_.1 <- results(dds, contrast = c("group", "Lagoon.Ojo", "Reef.Ojo"), alpha = 0.1)
head(results_LO_RO_.1)
## log2 fold change (MLE): group Lagoon.Ojo vs Reef.Ojo
## Wald test p-value: group Lagoon.Ojo vs Reef.Ojo
## DataFrame with 6 rows and 6 columns
## baseMean log2FoldChange
## <numeric> <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont 3.680556 -4.51967
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 8.693239 -3.19571
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont 0.989918 3.22963
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 59.846104 -1.50572
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 11.872667 -23.72183
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 5.786528 -5.31046
## lfcSE stat pvalue
## <numeric> <numeric> <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont 5.43498 -0.831588 0.405641347
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 2.65228 -1.204892 0.228245017
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont 5.46880 0.590556 0.554818143
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 1.24346 -1.210913 0.225928622
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 5.44774 -4.354435 0.000013341
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 5.43271 -0.977498 0.328322687
## padj
## <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont 0.88246791
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 0.88246791
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont 0.88246791
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 0.88246791
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 0.00084896
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 0.88246791
summary(results_LO_RO_.1)
##
## out of 10822 with nonzero total read count
## adjusted p-value < 0.1
## LFC > 0 (up) : 136, 1.3%
## LFC < 0 (down) : 91, 0.84%
## outliers [1] : 4, 0.037%
## low counts [2] : 0, 0%
## (mean count < 0)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
### convert to dataframe (from DESeq2 object)
res_LO_RO_.1_df <- data.frame(results_LO_RO_.1)
DE_LO_RO_padj.1 <- res_LO_RO_.1_df %>%
rownames_to_column('gene') %>%
as_tibble() %>%
filter(padj < 0.1) %>%
arrange(padj)
head(DE_LO_RO_padj.1)
## # A tibble: 6 x 7
## gene baseMean log2FoldChange lfcSE stat pvalue padj
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Cladocopium_Davies_com… 25.9 -41.4 2.97 -13.9 3.43e-44 3.71e-40
## 2 Cladocopium_Davies_com… 62.6 -24.4 2.90 -8.42 3.84e-17 2.08e-13
## 3 Cladocopium_Davies_com… 18.4 -43.4 5.45 -7.96 1.70e-15 6.15e-12
## 4 Cladocopium_Davies_com… 95.9 -25.5 3.41 -7.46 8.61e-14 2.33e-10
## 5 Cladocopium_Davies_com… 21.5 -23.5 3.29 -7.14 9.46e-13 2.05e- 9
## 6 Cladocopium_Davies_com… 12.7 -25.4 3.59 -7.06 1.65e-12 2.97e- 9
# save rlogged and csv file
saveRDS(DE_LO_RO_padj.1, file = "Sid_Symbiont_DE_genes_LO_RO_padj.1.rds")
write.csv(DE_LO_RO_padj.1, "Sid_Symbiont_DE_genes_LO_RO_padj.1.csv")
up.results_LO_RO_padj.1 = row.names(results_LO_RO_.1[results_LO_RO_.1$padj<0.1 & !(is.na(results_LO_RO_.1$padj)) & results_LO_RO_.1$log2FoldChange>0,])
up.results_LO_RO_padj.1 <- as.data.frame(up.results_LO_RO_padj.1)
up.results_LO_RO_padj.1$DEG <- 'Up'
down.results_LO_RO_padj.1 = row.names(results_LO_RO_.1[results_LO_RO_.1$padj<0.1 & !(is.na(results_LO_RO_.1$padj)) & results_LO_RO_.1$log2FoldChange<0,])
down.results_LO_RO_padj.1 <- as.data.frame(down.results_LO_RO_padj.1)
down.results_LO_RO_padj.1$DEG <- 'Down'
LO_RO_padj.1_summary <- merge(up.results_LO_RO_padj.1, down.results_LO_RO_padj.1, all=T)
dim(LO_RO_padj.1_summary)
## [1] 227 3
# save file
write.csv(LO_RO_padj.1_summary, file="Sid_Symbiont_DE_genes_LO_RO_padj.1_summary.csv")
This is later used as input for PERMANOVA
DE_LO_RO_pvalue <- res_LO_RO_.1_df %>%
rownames_to_column('gene') %>%
as_tibble() %>%
filter(pvalue < 0.05) %>%
arrange(pvalue)
head(DE_LO_RO_pvalue)
## # A tibble: 6 x 7
## gene baseMean log2FoldChange lfcSE stat pvalue padj
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Cladocopium_Davies_com… 25.9 -41.4 2.97 -13.9 3.43e-44 3.71e-40
## 2 Cladocopium_Davies_com… 62.6 -24.4 2.90 -8.42 3.84e-17 2.08e-13
## 3 Cladocopium_Davies_com… 18.4 -43.4 5.45 -7.96 1.70e-15 6.15e-12
## 4 Cladocopium_Davies_com… 95.9 -25.5 3.41 -7.46 8.61e-14 2.33e-10
## 5 Cladocopium_Davies_com… 21.5 -23.5 3.29 -7.14 9.46e-13 2.05e- 9
## 6 Cladocopium_Davies_com… 12.7 -25.4 3.59 -7.06 1.65e-12 2.97e- 9
# save rlogged and csv file
saveRDS(DE_LO_RO_pvalue, file = "Sid_Symbiont_LO_RO_pval.05.rds")
write.csv(DE_LO_RO_pvalue, "Sid_Symbiont_LO_RO_pval.05.csv")
dim(DE_LO_RO_pvalue)
## [1] 366 7
p-adj < 0.1
results_OO_OC_.1 <- results(dds, contrast = c("group", "Ojo.Ojo", "Ojo.Control"), alpha = 0.1)
head(results_OO_OC_.1)
## log2 fold change (MLE): group Ojo.Ojo vs Ojo.Control
## Wald test p-value: group Ojo.Ojo vs Ojo.Control
## DataFrame with 6 rows and 6 columns
## baseMean log2FoldChange
## <numeric> <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont 3.680556 0.000000
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 8.693239 0.290174
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont 0.989918 0.726623
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 59.846104 -3.456279
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 11.872667 15.255499
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 5.786528 -2.860805
## lfcSE stat pvalue
## <numeric> <numeric> <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont 6.20513 0.0000000 1.00000000
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 3.00815 0.0964624 0.92315333
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont 6.20279 0.1171446 0.90674550
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 1.33761 -2.5839259 0.00976828
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 6.19556 2.4623289 0.01380380
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 6.12995 -0.4666927 0.64071978
## padj
## <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont NA
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 1.000000
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont NA
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 0.376311
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 0.475646
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 1.000000
summary(results_OO_OC_.1)
##
## out of 10822 with nonzero total read count
## adjusted p-value < 0.1
## LFC > 0 (up) : 33, 0.3%
## LFC < 0 (down) : 9, 0.083%
## outliers [1] : 4, 0.037%
## low counts [2] : 8391, 78%
## (mean count < 5)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
### convert to dataframe (from DESeq2 object)
res_OO_OC_.1_df <- data.frame(results_OO_OC_.1)
DE_OO_OC_padj.1 <- res_OO_OC_.1_df %>%
rownames_to_column('gene') %>%
as_tibble() %>%
filter(padj < 0.1) %>%
arrange(padj)
head(DE_OO_OC_padj.1)
## # A tibble: 6 x 7
## gene baseMean log2FoldChange lfcSE stat pvalue padj
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Cladocopium_Davies_comp… 9.85 33.0 3.80 8.69 3.65e-18 8.87e-15
## 2 Cladocopium_Davies_comp… 7.06 33.4 4.29 7.78 7.29e-15 8.84e-12
## 3 Cladocopium_Davies_comp… 7.67 34.7 6.20 5.60 2.20e- 8 1.78e- 5
## 4 Cladocopium_Davies_comp… 11.0 28.0 5.49 5.09 3.56e- 7 2.16e- 4
## 5 Cladocopium_Davies_comp… 12.7 17.0 3.46 4.93 8.41e- 7 4.08e- 4
## 6 Cladocopium_Davies_comp… 15.9 28.8 6.17 4.67 3.01e- 6 9.41e- 4
# save rlogged and csv file
saveRDS(DE_OO_OC_padj.1, file = "Sid_Symbiont_DE_genes_OO_OC_padj.1.rds")
write.csv(DE_OO_OC_padj.1, "Sid_Symbiont_DE_genes_OO_OC_padj.1.csv")
up.results_OO_OC_padj.1 = row.names(results_OO_OC_.1[results_OO_OC_.1$padj<0.1 & !(is.na(results_OO_OC_.1$padj)) & results_OO_OC_.1$log2FoldChange>0,])
up.results_OO_OC_padj.1 <- as.data.frame(up.results_OO_OC_padj.1)
up.results_OO_OC_padj.1$DEG <- 'Up'
down.results_OO_OC_padj.1 = row.names(results_OO_OC_.1[results_OO_OC_.1$padj<0.1 & !(is.na(results_OO_OC_.1$padj)) & results_OO_OC_.1$log2FoldChange<0,])
down.results_OO_OC_padj.1 <- as.data.frame(down.results_OO_OC_padj.1)
down.results_OO_OC_padj.1$DEG <- 'Down'
OO_OC_padj.1_summary <- merge(up.results_OO_OC_padj.1, down.results_OO_OC_padj.1, all=T)
dim(OO_OC_padj.1_summary)
## [1] 42 3
# save file
write.csv(OO_OC_padj.1_summary, file="Sid_Symbiont_DE_genes_OO_OC_padj.1_summary.csv")
This is later used as input for PERMANOVA
DE_OO_OC_pvalue <- res_OO_OC_.1_df %>%
rownames_to_column('gene') %>%
as_tibble() %>%
filter(pvalue < 0.05) %>%
arrange(pvalue)
head(DE_OO_OC_pvalue)
## # A tibble: 6 x 7
## gene baseMean log2FoldChange lfcSE stat pvalue padj
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Cladocopium_Davies_comp… 9.85 33.0 3.80 8.69 3.65e-18 8.87e-15
## 2 Cladocopium_Davies_comp… 7.06 33.4 4.29 7.78 7.29e-15 8.84e-12
## 3 Cladocopium_Davies_comp… 7.67 34.7 6.20 5.60 2.20e- 8 1.78e- 5
## 4 Cladocopium_Davies_comp… 11.0 28.0 5.49 5.09 3.56e- 7 2.16e- 4
## 5 Cladocopium_Davies_comp… 12.7 17.0 3.46 4.93 8.41e- 7 4.08e- 4
## 6 Cladocopium_Davies_comp… 15.9 28.8 6.17 4.67 3.01e- 6 9.41e- 4
# save rlogged and csv file
saveRDS(DE_OO_OC_pvalue, file = "Sid_Symbiont_OO_OC_pval.05.rds")
write.csv(DE_OO_OC_pvalue, "Sid_Symbiont_OO_OC_pval.05.csv")
dim(DE_OO_OC_pvalue)
## [1] 146 7
p-adj < 0.1
results_OO_RO_.1 <- results(dds, contrast = c("group", "Ojo.Ojo", "Reef.Ojo"), alpha = 0.1)
head(results_OO_RO_.1)
## log2 fold change (MLE): group Ojo.Ojo vs Reef.Ojo
## Wald test p-value: group Ojo.Ojo vs Reef.Ojo
## DataFrame with 6 rows and 6 columns
## baseMean log2FoldChange
## <numeric> <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont 3.680556 -3.763659
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 8.693239 -2.162310
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont 0.989918 2.772920
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 59.846104 0.965381
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 11.872667 -3.992809
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 5.786528 -4.554452
## lfcSE stat pvalue
## <numeric> <numeric> <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont 4.82706 -0.779701 0.435567
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 2.33605 -0.925628 0.354639
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont 4.87924 0.568310 0.569825
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 1.06666 0.905048 0.365440
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 4.81061 -0.830000 0.406539
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 4.82449 -0.944027 0.345156
## padj
## <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont 0.504646
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 0.431507
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont NA
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 0.441241
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 0.478598
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 0.423066
summary(results_OO_RO_.1)
##
## out of 10822 with nonzero total read count
## adjusted p-value < 0.1
## LFC > 0 (up) : 171, 1.6%
## LFC < 0 (down) : 2755, 25%
## outliers [1] : 4, 0.037%
## low counts [2] : 1679, 16%
## (mean count < 1)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
### convert to dataframe (from DESeq2 object)
res_OO_RO_.1_df <- data.frame(results_OO_RO_.1)
DE_OO_RO_padj.1 <- res_OO_RO_.1_df %>%
rownames_to_column('gene') %>%
as_tibble() %>%
filter(padj < 0.1) %>%
arrange(padj)
head(DE_OO_RO_padj.1)
## # A tibble: 6 x 7
## gene baseMean log2FoldChange lfcSE stat pvalue padj
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Cladocopium_Davies_comp… 102. 10.6 1.34 7.90 2.82e-15 2.58e-11
## 2 Cladocopium_Davies_comp… 30.8 8.63 1.13 7.64 2.12e-14 6.47e-11
## 3 Cladocopium_Davies_comp… 29.6 9.49 1.24 7.66 1.92e-14 6.47e-11
## 4 Cladocopium_Davies_comp… 11.6 -35.8 4.82 -7.42 1.17e-13 2.68e-10
## 5 Cladocopium_Davies_comp… 27.5 9.24 1.25 7.38 1.61e-13 2.95e-10
## 6 Cladocopium_Davies_comp… 49.4 9.37 1.29 7.28 3.32e-13 5.06e-10
# save rlogged and csv files:
saveRDS(DE_OO_RO_padj.1, file = "Sid_Symbiont_DE_genes_OO_RO_padj.1.rds")
write.csv(DE_OO_RO_padj.1, "Sid_Symbiont_DE_genes_OO_RO_padj.1.csv")
up.results_OO_RO_padj.1 = row.names(results_OO_RO_.1[results_OO_RO_.1$padj<0.1 & !(is.na(results_OO_RO_.1$padj)) & results_OO_RO_.1$log2FoldChange>0,])
up.results_OO_RO_padj.1 <- as.data.frame(up.results_OO_RO_padj.1)
up.results_OO_RO_padj.1$DEG <- 'Up'
down.results_OO_RO_padj.1 = row.names(results_OO_RO_.1[results_OO_RO_.1$padj<0.1 & !(is.na(results_OO_RO_.1$padj)) & results_OO_RO_.1$log2FoldChange<0,])
down.results_OO_RO_padj.1 <- as.data.frame(down.results_OO_RO_padj.1)
down.results_OO_RO_padj.1$DEG <- 'Down'
OO_RO_padj.1_summary <- merge(up.results_OO_RO_padj.1, down.results_OO_RO_padj.1, all=T)
dim(OO_RO_padj.1_summary)
## [1] 2926 3
# save file
write.csv(OO_RO_padj.1_summary, file="Sid_Symbiont_DE_genes_OO_RO_padj.1_summary.csv")
This is later used as input for PERMANOVA
DE_OO_RO_pvalue <- res_OO_RO_.1_df %>%
rownames_to_column('gene') %>%
as_tibble() %>%
filter(pvalue < 0.05) %>%
arrange(pvalue)
head(DE_OO_RO_pvalue)
## # A tibble: 6 x 7
## gene baseMean log2FoldChange lfcSE stat pvalue padj
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Cladocopium_Davies_comp… 102. 10.6 1.34 7.90 2.82e-15 2.58e-11
## 2 Cladocopium_Davies_comp… 29.6 9.49 1.24 7.66 1.92e-14 6.47e-11
## 3 Cladocopium_Davies_comp… 30.8 8.63 1.13 7.64 2.12e-14 6.47e-11
## 4 Cladocopium_Davies_comp… 11.6 -35.8 4.82 -7.42 1.17e-13 2.68e-10
## 5 Cladocopium_Davies_comp… 27.5 9.24 1.25 7.38 1.61e-13 2.95e-10
## 6 Cladocopium_Davies_comp… 49.4 9.37 1.29 7.28 3.32e-13 5.06e-10
# save rlogged and csv files:
saveRDS(DE_OO_RO_pvalue, file = "Sid_Symbiont_OO_RO_pval.05.rds")
write.csv(DE_OO_RO_pvalue, "Sid_Symbiont_OO_RO_pval.05.csv")
dim(DE_OO_RO_pvalue)
## [1] 3952 7
p-adj < 0.1
results_RO_RC_.1 <- results(dds, contrast = c("group", "Reef.Ojo", "Reef.Control"), alpha = 0.1)
head(results_RO_RC_.1)
## log2 fold change (MLE): group Reef.Ojo vs Reef.Control
## Wald test p-value: group Reef.Ojo vs Reef.Control
## DataFrame with 6 rows and 6 columns
## baseMean log2FoldChange
## <numeric> <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont 3.680556 -0.779780
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 8.693239 0.659113
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont 0.989918 -4.904691
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 59.846104 -0.639050
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 11.872667 -0.816555
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 5.786528 -0.469931
## lfcSE stat pvalue
## <numeric> <numeric> <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont 4.310761 -0.180891 0.856453
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 2.077914 0.317199 0.751092
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont 4.379204 -1.119996 0.262715
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 0.958949 -0.666407 0.505151
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 4.304335 -0.189705 0.849540
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 4.307143 -0.109105 0.913119
## padj
## <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont NA
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 1
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont NA
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 1
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 1
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 1
summary(results_RO_RC_.1)
##
## out of 10822 with nonzero total read count
## adjusted p-value < 0.1
## LFC > 0 (up) : 8, 0.074%
## LFC < 0 (down) : 1, 0.0092%
## outliers [1] : 4, 0.037%
## low counts [2] : 8182, 76%
## (mean count < 5)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
### convert to dataframe (from DESeq2 object)
res_RO_RC_.1_df <- data.frame(results_RO_RC_.1)
DE_RO_RC_padj.1 <- res_RO_RC_.1_df %>%
rownames_to_column('gene') %>%
as_tibble() %>%
filter(padj < 0.1) %>%
arrange(padj)
head(DE_RO_RC_padj.1)
## # A tibble: 6 x 7
## gene baseMean log2FoldChange lfcSE stat pvalue padj
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Cladocopium_Davies_com… 48.5 4.78 0.798 5.98 2.18e-9 5.75e-6
## 2 Cladocopium_Davies_com… 31.0 2.97 0.638 4.65 3.35e-6 4.41e-3
## 3 Cladocopium_Davies_com… 21.4 2.87 0.649 4.42 9.68e-6 8.50e-3
## 4 Cladocopium_Davies_com… 24.6 2.75 0.667 4.13 3.64e-5 2.40e-2
## 5 Cladocopium_Davies_com… 8.51 3.52 0.904 3.89 1.00e-4 4.39e-2
## 6 Cladocopium_Davies_com… 6.75 3.72 0.952 3.91 9.34e-5 4.39e-2
# save rlogged file
saveRDS(DE_RO_RC_padj.1, file = "Sid_Symbiont_DE_genes_RO_RC_padj.1.rds")
write.csv(DE_RO_RC_padj.1, "Sid_Symbiont_DE_genes_RO_RC_padj.1.csv")
up.results_RO_RC_padj.1 = row.names(results_RO_RC_.1[results_RO_RC_.1$padj<0.1 & !(is.na(results_RO_RC_.1$padj)) & results_RO_RC_.1$log2FoldChange>0,])
up.results_RO_RC_padj.1 <- as.data.frame(up.results_RO_RC_padj.1)
up.results_RO_RC_padj.1$DEG <- 'Up'
down.results_RO_RC_padj.1 = row.names(results_RO_RC_.1[results_RO_RC_.1$padj<0.1 & !(is.na(results_RO_RC_.1$padj)) & results_RO_RC_.1$log2FoldChange<0,])
down.results_RO_RC_padj.1 <- as.data.frame(down.results_RO_RC_padj.1)
down.results_RO_RC_padj.1$DEG <- 'Down'
RO_RC_padj.1_summary <- merge(up.results_RO_RC_padj.1, down.results_RO_RC_padj.1, all=T)
dim(RO_RC_padj.1_summary)
## [1] 9 3
# save file
write.csv(RO_RC_padj.1_summary, file="Sid_Symbiont_DE_genes_RO_RC_padj.1_summary.csv")
This is later used as input for PERMANOVA
DE_RO_RC_pvalue <- res_RO_RC_.1_df %>%
rownames_to_column('gene') %>%
as_tibble() %>%
filter(pvalue < 0.05) %>%
arrange(pvalue)
head(DE_RO_RC_pvalue)
## # A tibble: 6 x 7
## gene baseMean log2FoldChange lfcSE stat pvalue padj
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Cladocopium_Davies_com… 48.5 4.78 0.798 5.98 2.18e-9 5.75e-6
## 2 Cladocopium_Davies_com… 31.0 2.97 0.638 4.65 3.35e-6 4.41e-3
## 3 Cladocopium_Davies_com… 21.4 2.87 0.649 4.42 9.68e-6 8.50e-3
## 4 Cladocopium_Davies_com… 24.6 2.75 0.667 4.13 3.64e-5 2.40e-2
## 5 Cladocopium_Davies_com… 6.75 3.72 0.952 3.91 9.34e-5 4.39e-2
## 6 Cladocopium_Davies_com… 8.51 3.52 0.904 3.89 1.00e-4 4.39e-2
# save rlogged file
saveRDS(DE_RO_RC_pvalue, file = "Sid_Symbiont_RO_RC_pval.05.rds")
write.csv(DE_RO_RC_pvalue, "Sid_Symbiont_RO_RC_pval.05.csv")
dim(DE_RO_RC_pvalue)
## [1] 494 7
p-adj < 0.1
results_OC_LC_.1 <- results(dds, contrast = c("group", "Ojo.Control", "Lagoon.Control"), alpha = 0.1)
head(results_OC_LC_.1)
## log2 fold change (MLE): group Ojo.Control vs Lagoon.Control
## Wald test p-value: group Ojo.Control vs Lagoon.Control
## DataFrame with 6 rows and 6 columns
## baseMean log2FoldChange
## <numeric> <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont 3.680556 0.00000
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 8.693239 1.17215
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont 0.989918 0.00000
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 59.846104 1.77331
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 11.872667 -14.35764
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 5.786528 3.16414
## lfcSE stat pvalue
## <numeric> <numeric> <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont 6.19858 0.000000 1.0000000
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 3.04201 0.385322 0.6999985
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont 6.19858 0.000000 1.0000000
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 1.32809 1.335231 0.1818008
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 6.19712 -2.316823 0.0205134
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 6.12332 0.516736 0.6053407
## padj
## <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont NA
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 1.000000
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont NA
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 0.730375
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 0.413885
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont NA
summary(results_OC_LC_.1)
##
## out of 10822 with nonzero total read count
## adjusted p-value < 0.1
## LFC > 0 (up) : 3, 0.028%
## LFC < 0 (down) : 10, 0.092%
## outliers [1] : 4, 0.037%
## low counts [2] : 9230, 85%
## (mean count < 6)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
### convert to dataframe (from DESeq2 object)
res_OC_LC_.1_df <- data.frame(results_OC_LC_.1)
DE_OC_LC_padj.1 <- res_OC_LC_.1_df %>%
rownames_to_column('gene') %>%
as_tibble() %>%
filter(padj < 0.1) %>%
arrange(padj)
head(DE_OC_LC_padj.1)
## # A tibble: 6 x 7
## gene baseMean log2FoldChange lfcSE stat pvalue padj
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Cladocopium_Davies_comp7… 12.7 -16.0 3.48 -4.59 4.37e-6 0.00693
## 2 Cladocopium_Davies_comp2… 25.9 14.6 3.46 4.22 2.46e-5 0.0162
## 3 Cladocopium_Davies_comp2… 13.8 16.7 4.01 4.17 3.05e-5 0.0162
## 4 Cladocopium_Davies_comp2… 9.99 -17.3 4.32 -4.00 6.43e-5 0.0255
## 5 Cladocopium_Davies_comp2… 39.2 -15.4 4.04 -3.82 1.34e-4 0.0380
## 6 Cladocopium_Davies_comp2… 22.2 -16.1 4.24 -3.80 1.44e-4 0.0380
# save rlogged and csv files
saveRDS(DE_OC_LC_padj.1, file = "Sid_Symbiont_DE_genes_OC_LC_padj.1.rds")
write.csv(DE_OC_LC_padj.1, "Sid_Symbiont_DE_genes_OC_LC_padj.1.csv")
up.results_OC_LC_padj.1 = row.names(results_OC_LC_.1[results_OC_LC_.1$padj<0.1 & !(is.na(results_OC_LC_.1$padj)) & results_OC_LC_.1$log2FoldChange>0,])
up.results_OC_LC_padj.1 <- as.data.frame(up.results_OC_LC_padj.1)
up.results_OC_LC_padj.1$DEG <- 'Up'
down.results_OC_LC_padj.1 = row.names(results_OC_LC_.1[results_OC_LC_.1$padj<0.1 & !(is.na(results_OC_LC_.1$padj)) & results_OC_LC_.1$log2FoldChange<0,])
down.results_OC_LC_padj.1 <- as.data.frame(down.results_OC_LC_padj.1)
down.results_OC_LC_padj.1$DEG <- 'Down'
OC_LC_padj.1_summary <- merge(up.results_OC_LC_padj.1, down.results_OC_LC_padj.1, all=T)
dim(OC_LC_padj.1_summary)
## [1] 13 3
# save file
write.csv(OC_LC_padj.1_summary, file="Sid_Symbiont_DE_genes_OC_LC_padj.1_summary.csv")
This is later used as input for PERMANOVA
DE_OC_LC_pvalue <- res_OC_LC_.1_df %>%
rownames_to_column('gene') %>%
as_tibble() %>%
filter(pvalue < 0.05) %>%
arrange(pvalue)
head(DE_OC_LC_pvalue)
## # A tibble: 6 x 7
## gene baseMean log2FoldChange lfcSE stat pvalue padj
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Cladocopium_Davies_comp7… 12.7 -16.0 3.48 -4.59 4.37e-6 0.00693
## 2 Cladocopium_Davies_comp2… 25.9 14.6 3.46 4.22 2.46e-5 0.0162
## 3 Cladocopium_Davies_comp2… 13.8 16.7 4.01 4.17 3.05e-5 0.0162
## 4 Cladocopium_Davies_comp2… 9.99 -17.3 4.32 -4.00 6.43e-5 0.0255
## 5 Cladocopium_Davies_comp2… 39.2 -15.4 4.04 -3.82 1.34e-4 0.0380
## 6 Cladocopium_Davies_comp2… 22.2 -16.1 4.24 -3.80 1.44e-4 0.0380
# save rlogged and csv files
saveRDS(DE_OC_LC_pvalue, file = "Sid_Symbiont_OC_LC_pval.05.rds")
write.csv(DE_OC_LC_pvalue, "Sid_Symbiont_OC_LC_pval.05.csv")
dim(DE_OC_LC_pvalue)
## [1] 237 7
p-adj < 0.1
results_OC_RC_.1 <- results(dds, contrast = c("group", "Ojo.Control", "Reef.Control"), alpha = 0.1)
head(results_OC_RC_.1)
## log2 fold change (MLE): group Ojo.Control vs Reef.Control
## Wald test p-value: group Ojo.Control vs Reef.Control
## DataFrame with 6 rows and 6 columns
## baseMean log2FoldChange
## <numeric> <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont 3.680556 -4.72371
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 8.693239 -1.79337
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont 0.989918 -2.85839
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 59.846104 3.78261
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 11.872667 -20.06486
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 5.786528 -2.16358
## lfcSE stat pvalue
## <numeric> <numeric> <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont 5.81256 -0.812672 0.416406068
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 2.81240 -0.637665 0.523691679
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont 5.81765 -0.491331 0.623192097
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 1.25340 3.017885 0.002545457
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 5.81122 -3.452781 0.000554839
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 5.73167 -0.377478 0.705818677
## padj
## <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont 0.4609784
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 0.5611427
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont 0.6540265
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 0.0415625
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 0.0195513
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 0.7303959
summary(results_OC_RC_.1)
##
## out of 10822 with nonzero total read count
## adjusted p-value < 0.1
## LFC > 0 (up) : 197, 1.8%
## LFC < 0 (down) : 5439, 50%
## outliers [1] : 4, 0.037%
## low counts [2] : 0, 0%
## (mean count < 0)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
### convert to dataframe (from DESeq2 object)
res_OC_RC_.1_df <- data.frame(results_OC_RC_.1)
DE_OC_RC_padj.1 <- res_OC_RC_.1_df %>%
rownames_to_column('gene') %>%
as_tibble() %>%
filter(padj < 0.1) %>%
arrange(padj)
head(DE_OC_RC_padj.1)
## # A tibble: 6 x 7
## gene baseMean log2FoldChange lfcSE stat pvalue padj
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Cladocopium_Davies_com… 9.85 -36.4 3.56 -10.2 1.55e-24 1.68e-20
## 2 Cladocopium_Davies_com… 7.06 -37.6 4.00 -9.40 5.54e-21 3.00e-17
## 3 Cladocopium_Davies_com… 16.4 6.22 0.908 6.85 7.39e-12 2.66e- 8
## 4 Cladocopium_Davies_com… 29.6 9.06 1.33 6.79 1.14e-11 3.08e- 8
## 5 Cladocopium_Davies_com… 13.9 8.29 1.23 6.74 1.60e-11 3.46e- 8
## 6 Cladocopium_Davies_com… 30.8 8.22 1.23 6.69 2.25e-11 4.06e- 8
# save rlogged and csv files
saveRDS(DE_OC_RC_padj.1, file = "Sid_Symbiont_DE_genes_OC_RC_padj.1.rds")
write.csv(DE_OC_RC_padj.1, "Sid_Symbiont_DE_genes_OC_RC_padj.1.csv")
up.results_OC_RC_padj.1 = row.names(results_OC_RC_.1[results_OC_RC_.1$padj<0.1 & !(is.na(results_OC_RC_.1$padj)) & results_OC_RC_.1$log2FoldChange>0,])
up.results_OC_RC_padj.1 <- as.data.frame(up.results_OC_RC_padj.1)
up.results_OC_RC_padj.1$DEG <- 'Up'
down.results_OC_RC_padj.1 = row.names(results_OC_RC_.1[results_OC_RC_.1$padj<0.1 & !(is.na(results_OC_RC_.1$padj)) & results_OC_RC_.1$log2FoldChange<0,])
down.results_OC_RC_padj.1 <- as.data.frame(down.results_OC_RC_padj.1)
down.results_OC_RC_padj.1$DEG <- 'Down'
OC_RC_padj.1_summary <- merge(up.results_OC_RC_padj.1, down.results_OC_RC_padj.1, all=T)
dim(OC_RC_padj.1_summary)
## [1] 5636 3
# save file
write.csv(OC_RC_padj.1_summary , file="Sid_Symbiont_DE_genes_OC_RC_padj.1_summary.csv")
This is later used as input for PERMANOVA
DE_OC_RC_pvalue <- res_OC_RC_.1_df %>%
rownames_to_column('gene') %>%
as_tibble() %>%
filter(pvalue < 0.05) %>%
arrange(pvalue)
head(DE_OC_RC_pvalue)
## # A tibble: 6 x 7
## gene baseMean log2FoldChange lfcSE stat pvalue padj
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Cladocopium_Davies_com… 9.85 -36.4 3.56 -10.2 1.55e-24 1.68e-20
## 2 Cladocopium_Davies_com… 7.06 -37.6 4.00 -9.40 5.54e-21 3.00e-17
## 3 Cladocopium_Davies_com… 16.4 6.22 0.908 6.85 7.39e-12 2.66e- 8
## 4 Cladocopium_Davies_com… 29.6 9.06 1.33 6.79 1.14e-11 3.08e- 8
## 5 Cladocopium_Davies_com… 13.9 8.29 1.23 6.74 1.60e-11 3.46e- 8
## 6 Cladocopium_Davies_com… 30.8 8.22 1.23 6.69 2.25e-11 4.06e- 8
# save rlogged and csv file
saveRDS(DE_OC_RC_pvalue, file = "Sid_Symbiont_OC_RC_pval.05.rds")
write.csv(DE_OC_RC_pvalue, "Sid_Symbiont_OC_RC_pval.05.csv")
dim(DE_OC_RC_pvalue)
## [1] 5516 7
p-adj < 0.1
results_LC_RC_.1 <- results(dds, contrast = c("group", "Lagoon.Control", "Reef.Control"), alpha = 0.1)
head(results_LC_RC_.1)
## log2 fold change (MLE): group Lagoon.Control vs Reef.Control
## Wald test p-value: group Lagoon.Control vs Reef.Control
## DataFrame with 6 rows and 6 columns
## baseMean log2FoldChange
## <numeric> <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont 3.680556 -4.84684
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 8.693239 -2.96552
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont 0.989918 -2.98161
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 59.846104 2.00930
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 11.872667 -5.70722
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 5.786528 -5.32772
## lfcSE stat pvalue
## <numeric> <numeric> <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont 4.568518 -1.06092 0.2887255
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 2.264373 -1.30964 0.1903161
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont 4.574989 -0.65172 0.5145817
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 0.994891 2.01962 0.0434231
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 4.564831 -1.25026 0.2112048
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 4.567811 -1.16636 0.2434684
## padj
## <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont 0.367683
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 0.279468
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont NA
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 0.121571
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 0.299687
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 0.328464
summary(results_LC_RC_.1)
##
## out of 10822 with nonzero total read count
## adjusted p-value < 0.1
## LFC > 0 (up) : 183, 1.7%
## LFC < 0 (down) : 2831, 26%
## outliers [1] : 4, 0.037%
## low counts [2] : 210, 1.9%
## (mean count < 1)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
### convert to dataframe (from DESeq2 object)
res_LC_RC_.1_df <- data.frame(results_LC_RC_.1)
DE_LC_RC_padj.1 <- res_LC_RC_.1_df %>%
rownames_to_column('gene') %>%
as_tibble() %>%
filter(padj < 0.1) %>%
arrange(padj)
head(DE_LC_RC_padj.1)
## # A tibble: 6 x 7
## gene baseMean log2FoldChange lfcSE stat pvalue padj
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Cladocopium_Davies_comp… 25.9 -20.9 2.49 -8.38 5.47e-17 5.80e-13
## 2 Cladocopium_Davies_comp… 30.8 7.80 1.03 7.54 4.55e-14 2.41e-10
## 3 Cladocopium_Davies_comp… 49.4 8.36 1.13 7.42 1.21e-13 4.27e-10
## 4 Cladocopium_Davies_comp… 13.8 7.65 1.04 7.35 2.00e-13 5.29e-10
## 5 Cladocopium_Davies_comp… 16.4 5.57 0.761 7.32 2.53e-13 5.38e-10
## 6 Cladocopium_Davies_comp… 29.6 8.13 1.13 7.19 6.61e-13 1.00e- 9
# save rlogged and csv file
saveRDS(DE_LC_RC_padj.1, file = "Sid_Symbiont_DE_genes_LC_RC_padj.1.rds")
write.csv(DE_LC_RC_padj.1, "Sid_Symbiont_DE_genes_LC_RC_padj.1.csv")
up.results_LC_RC_padj.1 = row.names(res_LC_RC_.1_df[res_LC_RC_.1_df$padj<0.1 & !(is.na(res_LC_RC_.1_df$padj)) & res_LC_RC_.1_df$log2FoldChange>0,])
up.results_LC_RC_padj.1 <- as.data.frame(up.results_LC_RC_padj.1)
up.results_LC_RC_padj.1$DEG <- 'Up'
down.results_LC_RC_padj.1 = row.names(res_LC_RC_.1_df[res_LC_RC_.1_df$padj<0.1 & !(is.na(res_LC_RC_.1_df$padj)) & res_LC_RC_.1_df$log2FoldChange<0,])
down.results_LC_RC_padj.1 <- as.data.frame(down.results_LC_RC_padj.1)
down.results_LC_RC_padj.1$DEG <- 'Down'
LC_RC_padj.1_summary <- merge(up.results_LC_RC_padj.1, down.results_LC_RC_padj.1, all=T)
dim(LC_RC_padj.1_summary)
## [1] 3014 3
# save file
write.csv(LC_RC_padj.1_summary, file="Sid_Symbiont_DE_genes_LC_RC_padj.1_summary.csv")
This is later used as input for PERMANOVA
DE_LC_RC_pvalue <- res_LC_RC_.1_df %>%
rownames_to_column('gene') %>%
as_tibble() %>%
filter(pvalue < 0.05) %>%
arrange(pvalue)
head(DE_LC_RC_pvalue)
## # A tibble: 6 x 7
## gene baseMean log2FoldChange lfcSE stat pvalue padj
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Cladocopium_Davies_comp… 25.9 -20.9 2.49 -8.38 5.47e-17 5.80e-13
## 2 Cladocopium_Davies_comp… 30.8 7.80 1.03 7.54 4.55e-14 2.41e-10
## 3 Cladocopium_Davies_comp… 49.4 8.36 1.13 7.42 1.21e-13 4.27e-10
## 4 Cladocopium_Davies_comp… 13.8 7.65 1.04 7.35 2.00e-13 5.29e-10
## 5 Cladocopium_Davies_comp… 16.4 5.57 0.761 7.32 2.53e-13 5.38e-10
## 6 Cladocopium_Davies_comp… 9.85 -20.0 2.79 -7.20 6.17e-13 1.00e- 9
# save rlogged and csv files
saveRDS(DE_LC_RC_pvalue, file = "Sid_Symbiont_LC_RC_pval.05.rds")
write.csv(DE_LC_RC_pvalue, "Sid_Symbiont_LC_RC_pval.05.csv")
dim(DE_LC_RC_pvalue)
## [1] 4073 7
low pH vs. ambient pH, but not comparative sites
p-adj < 0.1
results_OO_RC_.1 <- results(dds, contrast = c("group", "Ojo.Ojo", "Reef.Control"), alpha = 0.1)
head(results_OO_RC_.1)
## log2 fold change (MLE): group Ojo.Ojo vs Reef.Control
## Wald test p-value: group Ojo.Ojo vs Reef.Control
## DataFrame with 6 rows and 6 columns
## baseMean log2FoldChange
## <numeric> <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont 3.680556 -4.543439
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 8.693239 -1.503196
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont 0.989918 -2.131771
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 59.846104 0.326331
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 11.872667 -4.809364
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 5.786528 -5.024383
## lfcSE stat pvalue
## <numeric> <numeric> <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont 4.57731 -0.992601 0.320905
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 2.21867 -0.677521 0.498075
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont 4.58059 -0.465392 0.641651
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 1.00756 0.323884 0.746026
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 4.56261 -1.054083 0.291845
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 4.57660 -1.097841 0.272274
## padj
## <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont 0.363932
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 0.531906
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont 0.668276
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 0.766357
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 0.336357
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 0.317878
summary(results_OO_RC_.1)
##
## out of 10822 with nonzero total read count
## adjusted p-value < 0.1
## LFC > 0 (up) : 200, 1.8%
## LFC < 0 (down) : 7263, 67%
## outliers [1] : 4, 0.037%
## low counts [2] : 0, 0%
## (mean count < 0)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
# convert to dataframe (from DESeq2 object)
res_OO_RC_.1_df <- data.frame(results_OO_RC_.1)
DE_OO_RC_padj.1 <- res_OO_RC_.1_df %>%
rownames_to_column('gene') %>%
as_tibble() %>%
filter(padj < 0.1) %>%
arrange(padj)
head(DE_OO_RC_padj.1)
## # A tibble: 6 x 7
## gene baseMean log2FoldChange lfcSE stat pvalue padj
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Cladocopium_Davies_comp… 30.8 8.35 1.03 8.08 6.66e-16 7.21e-12
## 2 Cladocopium_Davies_comp… 29.6 8.83 1.13 7.82 5.49e-15 2.71e-11
## 3 Cladocopium_Davies_comp… 11.6 -35.4 4.58 -7.74 1.00e-14 2.71e-11
## 4 Cladocopium_Davies_comp… 8.59 -35.5 4.58 -7.77 8.05e-15 2.71e-11
## 5 Cladocopium_Davies_comp… 27.5 8.26 1.12 7.38 1.64e-13 2.99e-10
## 6 Cladocopium_Davies_comp… 13.9 7.83 1.06 7.37 1.66e-13 2.99e-10
# save rlogged file
saveRDS(DE_OO_RC_padj.1, file = "Sid_Symbiont_DE_genes_OO_RC_padj.1.rds")
write.csv(DE_OO_RC_padj.1, "Sid_Symbiont_DE_genes_OO_RC_padj.1.csv")
up.results_OO_RC_padj.1 = row.names(results_OO_RC_.1[results_OO_RC_.1$padj<0.1 & !(is.na(results_OO_RC_.1$padj)) & results_OO_RC_.1$log2FoldChange>0,])
up.results_OO_RC_padj.1 <- as.data.frame(up.results_OO_RC_padj.1)
up.results_OO_RC_padj.1$DEG <- 'Up'
down.results_OO_RC_padj.1 = row.names(results_OO_RC_.1[results_OO_RC_.1$padj<0.1 & !(is.na(results_OO_RC_.1$padj)) & results_OO_RC_.1$log2FoldChange<0,])
down.results_OO_RC_padj.1 <- as.data.frame(down.results_OO_RC_padj.1)
down.results_OO_RC_padj.1$DEG <- 'Down'
OO_RC_padj.1_summary <- merge(up.results_OO_RC_padj.1, down.results_OO_RC_padj.1, all=T)
dim(OO_RC_padj.1_summary )
## [1] 7463 3
# save file
write.csv(OO_RC_padj.1_summary , file="Sid_Symbiont_DE_genes_OO_RC_padj.1_summary.csv")
This is later used as input for PERMANOVA
DE_OO_RC_pvalue <- res_OO_RC_.1_df %>%
rownames_to_column('gene') %>%
as_tibble() %>%
filter(pvalue < 0.05) %>%
arrange(pvalue)
head(DE_OO_RC_pvalue)
## # A tibble: 6 x 7
## gene baseMean log2FoldChange lfcSE stat pvalue padj
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Cladocopium_Davies_comp… 30.8 8.35 1.03 8.08 6.66e-16 7.21e-12
## 2 Cladocopium_Davies_comp… 29.6 8.83 1.13 7.82 5.49e-15 2.71e-11
## 3 Cladocopium_Davies_comp… 8.59 -35.5 4.58 -7.77 8.05e-15 2.71e-11
## 4 Cladocopium_Davies_comp… 11.6 -35.4 4.58 -7.74 1.00e-14 2.71e-11
## 5 Cladocopium_Davies_comp… 27.5 8.26 1.12 7.38 1.64e-13 2.99e-10
## 6 Cladocopium_Davies_comp… 13.9 7.83 1.06 7.37 1.66e-13 2.99e-10
# save rlogged and csv files
saveRDS(DE_OO_RC_pvalue, file = "Sid_Symbiont_OO_RC_pval.05.rds")
write.csv(DE_OO_RC_pvalue, "Sid_Symbiont_OO_RC_pval.05.csv")
dim(DE_OO_RC_pvalue)
## [1] 6894 7
low pH vs. ambient pH, but not comparative sites
p-adj < 0.1
results_OO_LC_.1 <- results(dds, contrast = c("group", "Ojo.Ojo", "Lagoon.Control"), alpha = 0.1)
head(results_OO_LC_.1)
## log2 fold change (MLE): group Ojo.Ojo vs Lagoon.Control
## Wald test p-value: group Ojo.Ojo vs Lagoon.Control
## DataFrame with 6 rows and 6 columns
## baseMean log2FoldChange
## <numeric> <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont 3.680556 0.000000
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 8.693239 1.462328
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont 0.989918 0.849842
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 59.846104 -1.682968
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 11.872667 0.897860
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 5.786528 0.000000
## lfcSE stat pvalue
## <numeric> <numeric> <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont 5.05855 0.000000 1.000000
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 2.50335 0.584149 0.559120
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont 5.05568 0.168097 0.866507
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 1.09909 -1.531240 0.125710
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 5.04501 0.177970 0.858747
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 5.05855 0.000000 1.000000
## padj
## <numeric>
## Cladocopium_Davies_comp100143_c0_seq1_Symbiont NA
## Cladocopium_Davies_comp100292_c0_seq1_Symbiont 1
## Cladocopium_Davies_comp100526_c0_seq1_Symbiont NA
## Cladocopium_Davies_comp100592_c0_seq1_Symbiont 1
## Cladocopium_Davies_comp100905_c0_seq1_Symbiont 1
## Cladocopium_Davies_comp102141_c0_seq1_Symbiont 1
summary(results_OO_LC_.1)
##
## out of 10822 with nonzero total read count
## adjusted p-value < 0.1
## LFC > 0 (up) : 12, 0.11%
## LFC < 0 (down) : 9, 0.083%
## outliers [1] : 4, 0.037%
## low counts [2] : 8391, 78%
## (mean count < 5)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
# convert to dataframe (from DESeq2 object)
res_OO_LC_.1_df <- data.frame(results_OO_LC_.1)
DE_OO_LC_padj.1 <- res_OO_LC_.1_df %>%
rownames_to_column('gene') %>%
as_tibble() %>%
filter(padj < 0.1) %>%
arrange(padj)
head(DE_OO_LC_padj.1)
## # A tibble: 6 x 7
## gene baseMean log2FoldChange lfcSE stat pvalue padj
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Cladocopium_Davies_comp… 25.9 17.1 2.75 6.22 4.93e-10 1.20e-6
## 2 Cladocopium_Davies_comp… 9.85 16.7 3.08 5.40 6.73e- 8 8.17e-5
## 3 Cladocopium_Davies_comp… 13.8 16.7 3.30 5.07 4.05e- 7 3.27e-4
## 4 Cladocopium_Davies_comp… 12.7 -16.9 3.36 -5.01 5.42e- 7 3.29e-4
## 5 Cladocopium_Davies_comp… 13.2 -18.2 3.75 -4.85 1.26e- 6 6.13e-4
## 6 Cladocopium_Davies_comp… 12.3 -17.8 4.09 -4.35 1.36e- 5 4.72e-3
# save rlogged file
saveRDS(DE_OO_LC_padj.1, file = "Sid_Symbiont_DE_genes_OO_LC_padj.1.rds")
write.csv(DE_OO_LC_padj.1, "Sid_Symbiont_DE_genes_OO_LC_padj.1.csv")
up.results_OO_LC_padj.1 = row.names(results_OO_LC_.1[results_OO_LC_.1$padj<0.1 & !(is.na(results_OO_LC_.1$padj)) & results_OO_LC_.1$log2FoldChange>0,])
up.results_OO_LC_padj.1 <- as.data.frame(up.results_OO_LC_padj.1)
up.results_OO_LC_padj.1$DEG <- 'Up'
down.results_OO_LC_padj.1 = row.names(results_OO_LC_.1[results_OO_LC_.1$padj<0.1 & !(is.na(results_OO_LC_.1$padj)) & results_OO_LC_.1$log2FoldChange<0,])
down.results_OO_LC_padj.1 <- as.data.frame(down.results_OO_LC_padj.1)
down.results_OO_LC_padj.1$DEG <- 'Down'
OO_LC_padj.1_summary <- merge(up.results_OO_LC_padj.1, down.results_OO_LC_padj.1, all=T)
dim(OO_LC_padj.1_summary)
## [1] 21 3
# save file
write.csv(OO_LC_padj.1_summary, file="Sid_Symbiont_DE_genes_OO_LC_padj.1_summary.csv")
This is later used as input for PERMANOVA
DE_OO_LC_pvalue <- res_OO_LC_.1_df %>%
rownames_to_column('gene') %>%
as_tibble() %>%
filter(pvalue < 0.05) %>%
arrange(pvalue)
head(DE_OO_LC_pvalue)
## # A tibble: 6 x 7
## gene baseMean log2FoldChange lfcSE stat pvalue padj
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Cladocopium_Davies_comp… 25.9 17.1 2.75 6.22 4.93e-10 1.20e-6
## 2 Cladocopium_Davies_comp… 9.85 16.7 3.08 5.40 6.73e- 8 8.17e-5
## 3 Cladocopium_Davies_comp… 13.8 16.7 3.30 5.07 4.05e- 7 3.27e-4
## 4 Cladocopium_Davies_comp… 12.7 -16.9 3.36 -5.01 5.42e- 7 3.29e-4
## 5 Cladocopium_Davies_comp… 13.2 -18.2 3.75 -4.85 1.26e- 6 6.13e-4
## 6 Cladocopium_Davies_comp… 7.06 15.3 3.50 4.38 1.19e- 5 4.72e-3
# save rlogged and csv files
saveRDS(DE_OO_LC_pvalue, file = "Sid_Symbiont_OO_LC_pval.05.rds")
write.csv(DE_OO_LC_pvalue, "Sid_Symbiont_OO_LC_pval.05.csv")
dim(DE_OO_LC_pvalue)
## [1] 286 7
mcols(results_OO_LC_.1)$description
## [1] "mean of normalized counts for all samples"
## [2] "log2 fold change (MLE): group Ojo.Ojo vs Lagoon.Control"
## [3] "standard error: group Ojo.Ojo vs Lagoon.Control"
## [4] "Wald statistic: group Ojo.Ojo vs Lagoon.Control"
## [5] "Wald test p-value: group Ojo.Ojo vs Lagoon.Control"
## [6] "BH adjusted p-values"
sessionInfo()
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS High Sierra 10.13.4
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
##
## attached base packages:
## [1] stats4 parallel grid stats graphics grDevices utils
## [8] datasets methods base
##
## other attached packages:
## [1] affy_1.66.0 locfit_1.5-9.4
## [3] arrayQualityMetrics_3.44.0 DEGreport_1.24.1
## [5] DESeq2_1.28.1 SummarizedExperiment_1.18.2
## [7] DelayedArray_0.14.1 matrixStats_0.58.0
## [9] GenomicRanges_1.40.0 GenomeInfoDb_1.24.2
## [11] IRanges_2.22.2 S4Vectors_0.26.1
## [13] BiocManager_1.30.10 data.table_1.14.0
## [15] gridExtra_2.3 knitr_1.31
## [17] plyr_1.8.6 vsn_3.56.0
## [19] Biobase_2.48.0 BiocGenerics_0.34.0
## [21] devtools_2.3.2 usethis_2.0.1
## [23] ggrepel_0.9.1 cowplot_1.1.1
## [25] vegan_2.5-7 lattice_0.20-41
## [27] permute_0.9-5 forcats_0.5.1
## [29] stringr_1.4.0 dplyr_1.0.4
## [31] purrr_0.3.4 readr_1.4.0
## [33] tidyr_1.1.2 tibble_3.1.0
## [35] ggplot2_3.3.3 tidyverse_1.3.0
## [37] ComplexHeatmap_2.4.3 gplots_3.1.1
## [39] pheatmap_1.0.12 circlize_0.4.12
## [41] RColorBrewer_1.1-2
##
## loaded via a namespace (and not attached):
## [1] utf8_1.1.4 tidyselect_1.1.0
## [3] beadarray_2.38.0 htmlwidgets_1.5.3
## [5] RSQLite_2.2.3 AnnotationDbi_1.50.3
## [7] BiocParallel_1.22.0 munsell_0.5.0
## [9] codetools_0.2-18 preprocessCore_1.50.0
## [11] withr_2.4.1 colorspace_2.0-0
## [13] highr_0.8 rstudioapi_0.13
## [15] setRNG_2013.9-1 labeling_0.4.2
## [17] lasso2_1.2-21.1 GenomeInfoDbData_1.2.3
## [19] hwriter_1.3.2 mnormt_2.0.2
## [21] farver_2.0.3 bit64_4.0.5
## [23] rprojroot_2.0.2 vctrs_0.3.6
## [25] generics_0.1.0 xfun_0.21
## [27] R6_2.5.0 illuminaio_0.30.0
## [29] clue_0.3-58 gridSVG_1.7-2
## [31] bitops_1.0-6 cachem_1.0.4
## [33] reshape_0.8.8 assertthat_0.2.1
## [35] scales_1.1.1 nnet_7.3-15
## [37] gtable_0.3.0 processx_3.4.5
## [39] rlang_0.4.10 genefilter_1.70.0
## [41] systemfonts_1.0.1 GlobalOptions_0.1.2
## [43] splines_4.0.3 hexbin_1.28.2
## [45] checkmate_2.0.0 broom_0.7.5
## [47] reshape2_1.4.4 yaml_2.2.1
## [49] modelr_0.1.8 backports_1.2.1
## [51] Hmisc_4.4-2 tools_4.0.3
## [53] psych_2.0.12 logging_0.10-108
## [55] affyio_1.58.0 ellipsis_0.3.1
## [57] jquerylib_0.1.3 ggdendro_0.1.22
## [59] sessioninfo_1.1.1 Rcpp_1.0.6
## [61] base64enc_0.1-3 zlibbioc_1.34.0
## [63] RCurl_1.98-1.2 ps_1.5.0
## [65] prettyunits_1.1.1 openssl_1.4.3
## [67] rpart_4.1-15 GetoptLong_1.0.5
## [69] haven_2.3.1 cluster_2.1.1
## [71] fs_1.5.0 magrittr_2.0.1
## [73] reprex_1.0.0 tmvnsim_1.0-2
## [75] pkgload_1.2.0 hms_1.0.0
## [77] evaluate_0.14 xtable_1.8-4
## [79] XML_3.99-0.5 jpeg_0.1-8.1
## [81] gcrma_2.60.0 readxl_1.3.1
## [83] shape_1.4.5 testthat_3.0.2
## [85] compiler_4.0.3 KernSmooth_2.23-18
## [87] crayon_1.4.1 htmltools_0.5.1.1
## [89] mgcv_1.8-34 Formula_1.2-4
## [91] geneplotter_1.66.0 lubridate_1.7.10
## [93] DBI_1.1.1 dbplyr_2.1.0
## [95] MASS_7.3-53.1 Matrix_1.3-2
## [97] cli_2.3.1 pkgconfig_2.0.3
## [99] foreign_0.8-81 xml2_1.3.2
## [101] svglite_1.2.3.2 annotate_1.66.0
## [103] BeadDataPackR_1.40.0 bslib_0.2.4
## [105] affyPLM_1.64.0 XVector_0.28.0
## [107] rvest_0.3.6 callr_3.5.1
## [109] digest_0.6.27 ConsensusClusterPlus_1.52.0
## [111] Biostrings_2.56.0 base64_2.0
## [113] rmarkdown_2.7 cellranger_1.1.0
## [115] htmlTable_2.1.0 edgeR_3.30.3
## [117] gdtools_0.2.3 gtools_3.8.2
## [119] rjson_0.2.20 lifecycle_1.0.0
## [121] nlme_3.1-152 jsonlite_1.7.2
## [123] askpass_1.1 desc_1.2.0
## [125] limma_3.44.3 fansi_0.4.2
## [127] pillar_1.5.0 Nozzle.R1_1.1-1
## [129] fastmap_1.1.0 httr_1.4.2
## [131] pkgbuild_1.2.0 survival_3.2-7
## [133] glue_1.4.2 remotes_2.2.0
## [135] png_0.1-7 bit_4.0.4
## [137] stringi_1.5.3 sass_0.3.1
## [139] blob_1.2.1 latticeExtra_0.6-29
## [141] caTools_1.18.1 memoise_2.0.0